Why are neural networks equivalent to kernel methods?

Question

I read a recent paper by Pedro Domingos, claiming that Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.

I wanted to understand the key idea a little better. Why are neural networks equivalent to kernel methods? And in particular, is there any method to actually derive the equivalent kernel from an existing neural network?

some specific classes of neural networks are equivalent to kernel-based methods with the difference that NNs learn their own kernels. This blog post [A Statistical View of Deep Learning (III): Memory and Kernels](http://blog.shakirm.com/2015/04/a-statistical-view-of-deep-learning-iii-memory-and-kernels/) can shed some lights. — ARAT, Mar 06 '21 at 07:41

score 1 · Answer 1 · answered Jun 02 '23 at 14:48

First of all, the paper only implies that under certain conditions, neural networks and kernel functions exhibit an equivalence.

This is because the neural networks with their hidden layers converge during the training process to some optimal weight values for each neuron. These values align with a kernel matrix which means that their representations can be approximated by a kernel function.

Take for example support vector machines (SVMs) where the boundary between classes is determined by the similarity/kernel function between the inputs. A neural network would learn this through complex non-linear transformations using their activation function.

Why are neural networks equivalent to kernel methods?

1 Answers1