6

I understand the meaning of empirical risk minimization as separate topic and was reading about structural risk minimization, it is hard for me to understand the difference between these two.

I read somewhere that perceptron uses Emperical risk minimization where as SVM uses structural.

I know that SVM takes into consideration model complexity so it is somewhat related to structural risk minimization put this is not clear to me whats the difference

Pluviophile
  • 3,520
  • 11
  • 29
  • 49
A.B
  • 316
  • 1
  • 3
  • 12
  • Would it be possible to provide the link of the reference material which says "perceptron uses ERM and SVM uses structural" ? – avimehenwal Jan 22 '20 at 14:16

1 Answers1

4

SVMs were developed by Vapnik (1995,1998), based on the structural risk minimization principle (Vapnik, 1982) from statistical learning theory.

The complexity of the class of functions performing classification or regression and the algorithm’s generalizability is related. The Vapnik-Chervonenkis (VC) theory provides a general measure of complexity and proves bounds on errors as a function of complexity. Structural risk minimization is the minimization of these bounds, which depend on the empirical risk and the capacity of the function class

More Details with Math & Paper

More on SRM


In brief:

Empirical risk minimization (ERM) is a principle in statistical learning theory that defines a family of learning algorithms and is used to give theoretical bounds on their performance. The core idea is that we cannot know exactly how well an algorithm will work in practice (the true "risk") because we don't know the true distribution of data that the algorithm will work on, but we can instead measure its performance on a known set of training data (the "empirical" risk).

$$ R_{\mathrm{emp}}(h)=\frac{1}{m} \sum_{i=1}^{m} L\left(h\left(x_{i}\right), y_{i}\right) $$

Note the $h$ here stands for the f — our classifier. This function is called Empirical Risk Minimization (ERM).

Structural risk minimization (SRM) is an inductive principle of use in machine learning. Commonly in machine learning, a generalized model must be selected from a finite data set, with the consequent problem of overfitting – the model becoming too strongly tailored to the particularities of the training set and generalising poorly to new data. The SRM principle addresses this problem by balancing the model's complexity against its success at fitting the training data.

$$ R_{\mathrm{srm}}(f)=\frac{1}{N} \sum_{i=1}^{N} L\left(y_{i}, f\left(x_{i}\right)\right)+\lambda J(f) $$

which is called Structural Risk Minimization (SRM).

Where,

  • $J(f)$ is the model's complexity, usually the bound of vector space.
  • $λ≥0$ is the coefficient choosing the strength of the penalty term.

More Info at

Pluviophile
  • 3,520
  • 11
  • 29
  • 49
  • Is there a lower bound for the empirical risk on a training set ? I want to argue that the training error on a given training dataset can never go to zero even if the estimator keeps overfitting. Could you please comment on my hypothesis ? – Siddhant Tandon Dec 15 '21 at 08:35
  • Naively it looks like SRM is is just ERM with regularization. Is this the right way to think about it? – Rajiv Krishnakumar Sep 02 '22 at 08:06