25

I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two?

Is it because we have two $\theta$ here in the example?

Simon Larsson
  • 4,083
  • 1
  • 14
  • 29
Marton Langa
  • 353
  • 1
  • 3
  • 4

2 Answers2

23

It is simple. It is because when you take the derivative of the cost function, that is used in updating the parameters during gradient descent, that $2$ in the power get cancelled with the $\frac{1}{2}$ multiplier, thus the derivation is cleaner. These techniques are or somewhat similar are widely used in math in order "To make the derivations mathematically more convenient". You can simply remove the multiplier, see here for example, and expect the same result.

TwinPenguins
  • 4,157
  • 3
  • 17
  • 53
7

It makes the math easier to handle. Adding a half or not doesn't actually matter since minimizing is unaffected by constants.

Kane Chua
  • 106
  • 2