6

In SVM, we have kernel function that maps an input raw data space into a higher dimensional feature space

In CNN, we also have a 'kernel' mask that travels the input raw data space (image as a matrix) and map it to another space.

Given the fact that both these to methods are called 'kernel', I am wondering what is the connection between them from a mathematical perspective.

My guess is that it might have something to do with functional analysis.

eight3
  • 161
  • 1
  • Interesting... it just comes to my mind that both, SVM and NN, use softmax bounderies. Would be keen to see some good answers to your question. – Peter May 20 '19 at 08:26
  • As far as I know, the Kernel in the SVM is a function that maps the feature input space to a higher dimensional space in order to make it linearly separable (it is a trick...). In conv-nets, the Kernel is the patch filter used by the 2D convolutional filtering, which generates the features maps... – ignatius May 20 '19 at 08:58
  • @ignatius yes, you are right. I am aware of that, which leads to my question: is there any conceptual connection of these 2 approaches ? – eight3 May 20 '19 at 09:08

1 Answers1

2

There is no direct relationship between these two concepts. However we can find some indirect ones.

According to Merriam Webster,

kernel means a central or essential part

which hints why they are called "kernel". Specifically, deciding "how to measure point-point similarity (a.k.a. kernel function)" is the central part of kernel methods, and deciding "what array, matrix, or tensor (a.k.a. kernel matrix) to be convoluted with a data point" is the central part of convolutional neural networks.

A kernel function receives two data points, implicitly maps them into a higher (possibly infinite) dimension, then calculates their inner product.

A kernel matrix (or array, or tensor) is convoluted with one data point to map the data point explicitly into an often lower dimension. Here, we are ignoring a subtle difference between filter and kernel (a filter is composed of one kernel per channel).

Therefore, these two concepts are indirectly related based on mapping to a new representation. However,

  • Kernel functions map implicitly, but kernel matrices map explicitly,
  • Kernel functions cannot be stacked over each other (shallow representation), but kernel matrices can be since the input and output (explicit representations) has the same structure (deep representation),
  • The non-linearity of map is integrated into kernel functions, but for kernel matrices, we should apply a non-linear activation function after the (input, kernel) convolution to reach a similar non-linearity,
  • Implicit representations cannot be learned for kernel functions, a specific function implies a specific representation. However, for kernel matrices, representations can be learned by adjusting (learning) the weights of kernels, and can also be enriched by stacking kernels over each other.
Esmailian
  • 9,147
  • 2
  • 31
  • 47