4

When we talk about Perceptrons, we say that they are limited for approximating functions that are linearly separable, while Neural Networks that use non-linear transformations are not.

I am having trouble understanding this idea of linear separability. Specifically, does it apply only for binary classification or it generalizes for N classes? If so, how does a linear decision boundary for lets say 4 classes even look like?

Stefan Radonjic
  • 716
  • 1
  • 7
  • 20

2 Answers2

1

1.) Perceptron is a non-linear transformation!

2.) Linear seperable function is only defined for boolean functions, see Wikipedia. Therefore, yes, the statement is meant only for binary classification.

3.) For general functions, see the universal approximation theorem.

Graph4Me Consultant
  • 1,014
  • 5
  • 15
  • As Debasish Jana points out, using the kernel trick allows to use a higher-dimensional space where the binary classification problem becomes linearly seperable (see https://en.wikipedia.org/wiki/Kernel_method and https://en.wikipedia.org/wiki/Kernel_perceptron). – Graph4Me Consultant Sep 15 '20 at 22:07
  • Note also that using a deep network, the output of the last hidden layer is the input mapped to a higher-dimensional space. Using a sigmoid function on the output layer corresponds to a normal linear logistic regression on the high-dimensional data, per output-node. – Graph4Me Consultant Sep 15 '20 at 22:16
  • given the correct loss is used, of course – Graph4Me Consultant Sep 15 '20 at 22:23
-2

I will try to explain this problem in a classification scenario. Say we begin with two classes of images say dog and cat. And we are asked to classify the images whether the given test image is dog or cat. Imagine the images that we are given is $8 \times 8$. So each image is represented by an $8 \times 8=64$ dimensional vector. For the training dataset, one such training image will form a data point in this 64-dimensional space. After lots of training data of cat and dog images say 1000 training cat images and 1000 training dog images, we get 2000 such points in the 64-dimensional space. If in that space, the points can be classified using a 64-dimensional hyperplane, then it's linearly separable in an easy way. If not, we can use kernel which is a nonlinear function of the given image, which will transfer the 64-dimensional space into a higher space (more than 64, where we can find linear separability. In a word, each problem is linearly separable. For more information, you can go through Linear Support Vector Machine and Kernel Support Vector Machines.

  • Hi, thanks for spending time to answer my question! Unfortunately it is NOT the answer I was looking for. First of all, you did not address my question. Is linear separability property that applies only to binary classification or can it be generalized to N classes? Second, I think that you are wrong about hyperplane in your example. Dimensionality of a hyperplane is by definition 1 less then the ambient space. So in your example, hyperplane that represents decision boundary should be 63 dimensions, not 64. – Stefan Radonjic Jan 24 '20 at 15:41
  • Answer to the first question: Yes. It can be generalized to N classes. – Debasish Jana Jan 28 '20 at 15:45
  • Second question: You are absolutely right that the hyperplane will be of 63 dimensions. – Debasish Jana Jan 28 '20 at 15:46