What are the benefits of using spectral k-means over simple k-means?

Question

I have understood why k-means can get stuck in local minima. Now, I am curious to know how the spectral k-means helps to avoid this local minima problem.

According to this paper A tutorial on Spectral, The spectral algorithm goes in the following way

Project data into $R^n$ matrix
Define an Affinity matrix A , using a Gaussian Kernel K or an
Adjacency matrix
Construct the Graph Laplacian from A (i.e. decide on a normalization)
Solve the Eigenvalue problem
Select k eigenvectors corresponding to the k lowest (or highest) eigenvalues to define a k-dimensional subspace
Form clusters in this subspace using k-means

In step 6, it is using k-means. How is it overcoming the local minima problem of k-means? Moreover, what are the benefits of spectral over k-means?

If someone gives detailed insight upon this, it will be very helpful for me.

score 1 · Accepted Answer · answered Apr 11 '22 at 13:05

They are totally different approaches. Spectral Embedding is a representation of your data and it maps close data points next to each other in a new feature space. This helps k-means to deal with more separable clusters rather than original space. But this is not my answer to your question! The answer is that you need to know not only the algorithm of Spectral Clustering but how it maps data into a new vector space. If you read about that you will understand it easily. I give it a try here:

Spectral Embedding (on which you apply a simple clustering and get Spectral Clustering!) is basically a graph embedding method. What graph means is out of scope of this answer and I assume you know it. In graphs, a good clustering puts those nodes who have many "within connections" (with each other) and a few "between connections" (with other parts of graph) into a cluster. See bellow image to understand more.

Red nodes have several connections to each other but one single connection relates them to other parts of graph. It is a good cluster, right?

An application example could be suggesting friends in Facebook. Facebook is a graph in which each node is a person and each connection implies friendship on FB. You have many connections with your circle of friends and others have the same "cluster of friends" too. If we cluster people on FB graph, we can check which people in same cluster are not friends and we suggest them to each other, right?!

Based on brilliant work of Miroslav Fiedler called "Algebraic Connectivity of Graphs", this graph clustering can happen easily by finding those edges that removing them, clusters data in the way I mentioned above.

Spectral Clustering is mainly doing it. Now the question is "How can I get benefit of such graph representation in a clustering problem which is hard for k-means?"
Look at image bellow. It is a famous example of non-linearly separable data that k-means easily fails to cluster (because red points are between any two blue points which are on different sides of blue circle. K-means gets confused here!)

Now we convert our data into a graph using the rule each data point is a node and every node is connected to its k-nearest neighbours. How will that graph look like?! It will densely connect the nodes related to blue points to each other, and nodes related to red points to each other and probably very little connection between any blue and red node as they are barely in neighbourhood of each other. What we did was "converting non-linearly separable but connected circles in original space into connected bunches of nodes in graph representation". Now we just need to turn the graph representation into numerical vectors and Taddaaa! we will have those strange clsusters as two beautifully separable bunches of points here k-mean does not get confused anymore!

DISCLAIMER: There are many simplifications and not accurately use of terms in my answer. I tried to be intuitive more than accurate (e.g. our example data will result in two disconnected subgraphs which need numerical solution to be clustered using spectral clustering but does not affect the concept anyways)

I have understood your beautiful answer,. After reading many papers and blogs related to this , I have reached to this point https://math.stackexchange.com/questions/4421842/what-are-the-benefits-of-using-spectral-k-means-over-simple-k-means-and-how-sp/4423883?noredirect=1#comment9257845_4423883 and the lines written in bold part are now clear for me. Nonlinearity gets solved by Spectral through it's spectral embedding magic tool. But still my concern is after feeding those points to k-means converging to local minina may occur. How it is avoided? — Amartya, Apr 11 '22 at 14:45
Ah ok ... so keep something in mind. K-means never falls into local minima if clusters are separated gaussians! This is the rule you need to know and then my answer makes sense to you. spectral clustering maps all related points in original space to a bunch of points next to each other (as I explained). In this new mapping, clusters are much more gaussian-like than original space. That is the trick! in our example one cluster is literally inside the other one but their spectrally embedded version creates two separated cluster which are much closer to the concept of being normally distributed. — Kasra Manshaei, Apr 11 '22 at 17:01
In other words, spectral embedding helps to create non-convex clusters as it follows the minimum (or normalized) cut approach toward clustering problem — Kasra Manshaei, Apr 11 '22 at 17:11
I have completely understood the intution behind this.Thank you very much.Just one question clusters are seperated gaussians means the data points are distributed in bell shaped right? As per my knowledge the RSS function is a non convex one , so bydefault if we are going to optimize it we can stop in a local minima.... so how this rule incorporates 'K-means never falls into local minima if clusters are separated gaussians! ' — Amartya, Apr 11 '22 at 17:39
I meant "each cluster" is gaussian, not all the data. I think you mean this too, and honestly that "never" was one of those intuitively used but wrong terms :)) It has been shown that under some conditions of withing cluster connectivities and between cluster connectivities in graph partitions, spectral clustering can find a global solution. — Kasra Manshaei, Apr 11 '22 at 19:42
two points: 1) the prove is for graph representation of data so it is hard for me to directly relate it to objective function of k-means and 2) spectral clustering can turn many un-bell-shaped clusters into bell-shaped so the probability of finding global minimum is increased after spectral mapping. — Kasra Manshaei, Apr 11 '22 at 19:43
Honestly I had to read a bit to answer your question as well so do not claim on 100% accuracy and precision. I hope I gave a starting point about how to look at the problem and hopefully you find more resources and go on with this. I recommend to have a look at concepts like "normalized cut" and "ratio cut" in graphs. Two possible starting points would be https://www.cis.upenn.edu/~cis515/cis515-15-spectral-clust-chap5.pdf and http://www.sci.utah.edu/~gerig/CS7960-S2010/handouts/Normalized%20Graph%20cuts.pdf — Kasra Manshaei, Apr 11 '22 at 19:43
thank you so much, I can't find better answer than this over the web. These SE forums is indeed amazing :) — Amartya, Apr 12 '22 at 06:57

What are the benefits of using spectral k-means over simple k-means?

1 Answers1