4

I'm trying to learn maths behind SVM (hard margin) but due to different forms of mathematical formulations I'm bit confused.

  • Assume we have two sets of points $\text{(i.e. positives, negatives)}$ one on each side of hyperplane $\pi$.

  • So the equation of the margin maximizing plane $\pi$ can be written as, $$\pi:\;W^TX+b = 0$$

    • If $y\in$ $(1,-1)$ then,

$$\pi^+:\; W^TX + b=+1$$ $$\pi^-:\; W^TX + b=-1$$

  • Here $\pi^+$ and $\pi^-$ are parallel to plane $\pi$ and they are also parallel to each other. Now the objective would be to find a hyperplane $\pi$ which maximizes the distance between $\pi^+$ and $\pi^-$.

Here $\pi^+$ and $\pi^-$ are the hyperplanes passing through positive and negative support vectors respectively

  • According to wikipedia about SVM I've found that distance/margin between $\pi^+$ and $\pi^-$ can be written as, $$\hookrightarrow\frac{2}{||w||}$$

  • Now if I put together everything this is the constraint optimization problem we want to solve, $$\text{find}\;w_*,b_* = \underbrace{argmax}_{w,b}\frac{2}{||w||} \rightarrow\text{margin}$$ $$\hookrightarrow \text{s.t}\;\;y_i(w^Tx\;+\;b)\;\ge 1\;\;\;\forall\;x_i$$

Before proceeding to my doubts please do confirm if my understanding above is correct? If you find any mistakes please do correct me.

  • How to derive margin between $\pi^+$ and $\pi^-$ to be $\frac{2}{||w||}?$ I did find a similar question asked here but I couldn't understand the formulations used there? If possible can anyone explain it in the formulation I used above?
  • How can $y_i(w^Tx+b)\ge1\;\;\forall\;x_i$?
Jeeth
  • 911
  • 2
  • 10
  • 18

1 Answers1

7

Your understandings are right.

deriving the margin to be $\frac{2}{|w|}$

we know that $w \cdot x +b = 1$

If we move from point z in $w \cdot x +b = 1$ to the $w \cdot x +b = 0$ we land in a point $\lambda$. This line that we have passed or this margin between the two lines $w \cdot x +b = 1$ and $w \cdot x +b = 0$ is the margin between them which we call $\gamma$

For calculating the margin, we know that we have moved from z, in opposite direction of w to point $\lambda$. Hence this margin $\gamma$ would be equal to $z - margin \cdot \frac{w}{|w|} = z - \gamma \cdot \frac{w}{|w|} =$ (we have moved in the opposite direction of w, we just want the direction so we normalize w to be a unit vector $\frac{w}{|w|}$)

Since this $\lambda$ point lies in the decision boundary we know that it should suit in line $w \cdot x + b = 0$ Hence we set is in this line in place of x:

$$w \cdot x + b = 0$$ $$w \cdot (z - \gamma \cdot \frac{w}{|w|}) + b = 0$$ $$w \cdot z + b - w \cdot \gamma \cdot \frac{w}{|w|}) = 0$$ $$w \cdot z + b = w \cdot \gamma \cdot \frac{w}{|w|}$$ we know that $w \cdot z +b = 1$ (z is the point on $w \cdot x +b = 1)$ $$1 = w \cdot \gamma \cdot \frac{w}{|w|}$$ $$\gamma= \frac{1}{w} \cdot \frac{|w|}{w} $$ we also know that $w \cdot w = |w|^2$, hence: $$\gamma= \frac{1}{|w|}$$ Why is in your formula 2 instead of 1? because I have calculated the margin between the middle line and the upper, not the whole margin.

  • How can $y_i(w^Tx+b)\ge1\;\;\forall\;x_i$?

We want to classify the points in the +1 part as +1 and the points in the -1 part as -1, since $(w^Tx_i+b)$ is the predicted value and $y_i$ is the actual value for each point, if it is classified correctly, then the predicted and actual values should be same so their production $y_i(w^Tx+b)$ should be positive (the term >= 0 is substituded by >= 1 because it is a stronger condition)

The transpose is in order to be able to calculate the dot product. I just wanted to show the logic of dot product hence, didn't write transpose


For calculating the total distance between lines $w \cdot x + b = -1$ and $w \cdot x + b = 1$:

Either you can multiply the calculated margin by 2 Or if you want to directly find it, you can consider a point $\alpha$ in line $w \cdot x + b = -1$. then we know that the distance between these two lines is twice the value of $\gamma$, hence if we want to move from the point z to $\alpha$, the total margin (passed length) would be: $$z - 2 \cdot \gamma \cdot \frac{w}{|w|}$$ then we can calculate the margin from here.

derived from ML course of UCSD by Prof. Sanjoy Dasgupta

Fatemeh Asgarinejad
  • 1,164
  • 1
  • 8
  • 17
  • 2
    "we know that we have moved from z, in direction of w to point λ" Here z lies in ($w^t.x+b=+1$.) we also know that normal(w) to the plane $\pi$ is in direction of positive points right? So aren't we moving from z to point $\lambda$ in opposite direction of w? – Jeeth Jul 12 '19 at 23:50
  • How can I calculate whole margin so that formula becomes $\frac{2}{||w||}$? It will give me an intuition. – Jeeth Jul 13 '19 at 00:15
  • $\frac{2}{w}$ is twice the value I calculated. because what I calculated is the distance from the above line to middle. Hence, the whole distance between $w . x +b = 1$ and $w . x +b = -1$ would be $\frac{2}{w}$ – Fatemeh Asgarinejad Jul 13 '19 at 00:37
  • You are right we are moving to the opposite direction of w, it is why we have that negetive for the unit value of w. I edited my post. – Fatemeh Asgarinejad Jul 13 '19 at 00:37
  • Thanks and you've also mentioned that because we are moving in opposite direction we normalize w to be a unit vector i.e $\frac{w}{||w||}$. I didn't totally get it. Why do we have to normalize? Is it for convenience. – Jeeth Jul 13 '19 at 00:45
  • You're welcome. I didn't mean it. In fact, we normalize because we just want the direction. – Fatemeh Asgarinejad Jul 13 '19 at 00:46
  • Can you also help mention how to calculate it for whole margin so the formula becomes $\frac{2}{||w||}$? – Jeeth Jul 13 '19 at 00:50
  • 1
    I added it to the bottom of my post. – Fatemeh Asgarinejad Jul 13 '19 at 00:58
  • 1
    I really appreciate all the help. Thanks :) – Jeeth Jul 13 '19 at 01:03
  • 1
    I'm happy that I could help. :) – Fatemeh Asgarinejad Jul 13 '19 at 01:03