3

Kernel ridge regression associate a regularization parameter $C$ with weight term ($\beta$):

$\text{Minimize}: {KRR}=C\frac{1}{2} \left \|\beta\right\|^{2} + \frac{1}{2}\sum_{i=1}^{\mathcal{N}}\left\|e_i \right \|_2^{2} \\ \text{Subject to}:\ {\beta^T\phi_i}=y_i - e_i, \text{ }i=1,2,...,\mathcal{N}$

If we associate $C$ with an error term as follows:

$\text{Minimize}: {KRR}=\frac{1}{2} \left \|\beta\right\|^{2} + C\frac{1}{2}\sum_{i=1}^{\mathcal{N}}\left\|e_i \right \|_2^{2} \\ \text{Subject to}:\ {\beta^T\phi_i}=y_i - e_i, \text{ }i=1,2,...,\mathcal{N}$

then how this second formulation is different from the first one?

or

Can we associate $C$ either with weight term or error term in Kernel ridge regression?

Chandan Gautam
  • 301
  • 2
  • 13

1 Answers1

3

Both formulations lead to the same solution if you correctly choose $C$ for both cost functions and if $C>0$.

If we have the regularized loss

$$J_1=\dfrac{1}{2}\sum_{n=1}^Ne_n^2+\dfrac{1}{2}C\sum_{k=0}^pw_k^2$$

we will have strong regularization for larger $C$ and small regularization for small positive $C$.

If we devide the loss $J_1$ by the positive $C$ we obtain the loss

$$J_2= \dfrac{1}{2C}\sum_{n=1}^Ne_n^2+\dfrac{1}{2}\sum_{k=0}^pw_k^2.$$

As $C>0$ we just scaled our loss funtion $J_1$, hence the minimum will not change. But the interpretation of $C$ will change if we replace it with its inverse as proposed in your question to obtain the loss

$$J_3= \dfrac{1}{2}\tilde{C}\sum_{n=1}^Ne_n^2+\dfrac{1}{2}\sum_{k=0}^pw_k^2.$$

With $\tilde{C}$ very small we will have strong regularization. And for large $\tilde{C}$ we will have small regularization. That is the reason why we sometimes call $\tilde{C}$ the inverse regularization parameter (e.g. see support vector machines).

MachineLearner
  • 1,918
  • 7
  • 15