I read a recent paper by Pedro Domingos, claiming that Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.
I wanted to understand the key idea a little better. Why are neural networks equivalent to kernel methods? And in particular, is there any method to actually derive the equivalent kernel from an existing neural network?