23

I am a beginner on Machine Learning. In SVM, the separating hyperplane is defined as $y = w^T x + b$. Why we say vector $w$ orthogonal to the separating hyperplane?

Nitesh
  • 1,615
  • 1
  • 12
  • 22
Chong Zheng
  • 339
  • 1
  • 2
  • 4
  • 3
    An answer to a similar question (for neural networks) is [here](http://stackoverflow.com/a/10357067/1361822). – bogatron Jun 09 '15 at 16:39
  • @bogatron - I agree with you completely. But my ones just a **SVM specific** answer. – untitledprogrammer Jun 10 '15 at 19:43
  • 3
    Except it isn't. Your answer is correct but there is nothing about it that is specific to SVMs (nor should there be). $w^{T}x=b$ is simply a vector equation that defines a hyperplane. – bogatron Jun 10 '15 at 22:01

4 Answers4

13

Geometrically, the vector w is directed orthogonal to the line defined by $w^{T} x = b$. This can be understood as follows:

First take $b = 0$. Now it is clear that all vectors, $x$, with vanishing inner product with $w$ satisfy this equation, i.e. all vectors orthogonal to w satisfy this equation.

Now translate the hyperplane away from the origin over a vector a. The equation for the plane now becomes: $(x − a)^{T} w = 0$, i.e. we find that for the offset $b = a^{T} w$, which is the projection of the vector $a$ onto the vector $w$.

Without loss of generality we may thus choose a perpendicular to the plane, in which case the length $\vert\vert a \vert\vert = \vert b \vert /\vert\vert w\vert\vert$ which represents the shortest, orthogonal distance between the origin and the hyperplane.

Hence the vector $w$ is said to be orthogonal to the separating hyperplane.

11

Let the decision boundary be defined as $w^Tx + b = 0$. Consider the points $x_a$ and $x_b$, which lie on the decision boundary. This gives us two equations:

\begin{equation} w^Tx_a + b = 0 \\ w^Tx_b + b = 0 \end{equation}

Subtracting these two equations gives us $w^T.(x_a - x_b) = 0$. Note that the vector $x_a - x_b$ lies on the decision boundary, and it is directed from $x_b$ to $x_a$. Since the dot product $w^T.(x_a - x_b)$ is zero, $w^T$ must be orthogonal to $x_a - x_b$, and in turn, to the decision boundary.→AB⋅→AC=AB⋅AC⋅cos^BAC which should be equal to 90°

adityagaydhani
  • 111
  • 1
  • 4
6

The reason why $w$ is normal to the hyper-plane is because we define it to be that way:

Suppose that we have a (hyper)plane in 3d space. Let $P_0$ be a point on this plane i.e. $P_0 = x_0, y_0, z_0$. Therefore the vector from the origin $(0,0,0)$ to this point is just $<x_0,y_0,z_0>$. Suppose that we have an arbitrary point $P (x,y,z)$ on the plane. The vector joining $P$ and $P_0$ is then given by: $$ \vec{P} - \vec{P_0} = <x-x_0, y-y_0, z-z_0>$$ Note that this vector lies in the plane.

Now let $\hat{n}$ be the normal (orthogonal) vector to the plane. Therefore: $$ \hat{n} \bullet (\vec{P}-\vec{P_0}) = 0$$ Therefore: $$\hat{n} \bullet \vec{P}- \hat{n} \bullet \vec{P_0} = 0$$ Note that $-\hat{n} \bullet \vec{P_0}$ is just a number and is equal to $b$ in our case, whereas $\hat{n}$ is just $w$ and $\vec{P}$ is $x$. So by definition, $w$ is orthogonal to the hyperplane.

1

Using the algebraic definition of a vector being orthogonal to a hyperplane:

$\forall \ x_1, x_2$ on the separating hyperplane,

$$ w^T(x_1-x_2)=(w^Tx_1 + b)-(w^Tx_2 + b)=0-0=0 \ \small\Box.$$

Indominus
  • 155
  • 6