Why do we choose principal components based on maximum variance explained?

Question

I've seen many people choose # of principal components for PCA based on maximum variance explained. So my question is do we always have to choose principal components based on maximum variance explained? Is it applicable for all scenarios i.e text count vectors(BoW, tfidf..) where number of dimensions are really high.

Does maximum variance means most information about my data in higher dimension is captured into lower dimension?

Usually I'd plot something like this to see the variance explained.

plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Principal Components')
plt.ylabel('Variance ratio')
plt.show()

Mathematically the answer for first question is: **NO**, are several equivalent ways of deriving the principal component mathematically. The simplest way is by finding the projections which maximize the variance so for this reason is widely used(easy to compute). — n1tk, Mar 11 '19 at 22:44

Esmailian · Accepted Answer · 2019-03-09T07:39:04.003

do we always have to choose principal components based on maximum variance explained?

Yes. "Maximum variance explained" is closely related to the main objective as follows.

Our main objective is: for a limited budget K dimensions, what information $\mbox{a}=(a_1,...,a_K)$ to keep from original data $\mbox{x}=(x_1,...,x_D)$ ($D \gg K$) in order to be able to reconstruct $\mbox{x}$ from $\mbox{a}$ as close as possible?

If we only allow rotation and scaling of original data, i.e. $a_k := \mbox{x}.\mbox{v}_k$ for unknown set of vectors $V_K=\{\mbox{v}_k|\mbox{v}_k \in \mathbb{R}^D, 1 \leq k \leq K\}$, and define the reconstruction error as $$loss(\mbox{x},V_K):=\left \| \mbox{x}-\underbrace{\sum_{k=1}^{K}\overbrace{(\mbox{x}.\mbox{v}_k)}^{a_k}\mbox{v}_k}_{\hat{\mbox{x}}} \right \|^2,$$ the solution $V^*_K$ that minimizes this error is PCA. For first dimension, PCA keeps the projection of data on vector $\mbox{v}^*_1$ in the direction of largest data variance, namely $a^*_1$. For second dimension, it keeps the projection on vector $\mbox{v}^*_2$ in the direction of second largest data variance, namely $a^*_2$, and so on.

In other words, when we try to find a K-vector set $V_K$ that minimizes $loss(X,V_K)=\frac{1}{N}\sum_{n=1}^{N}loss(\mbox{x}_n,V_K)$, the solution $V^*_K$ includes $\mbox{v}^*_k$ that is in the direction of $k\mbox{-th}$ largest data variance.

Note that "ratio of variance explained" is a measure from statistics. Using the previous notations, it is defined as:

$$\mbox{R}(X,V_K):=1 - \frac{loss(X,V_K)}{Var(X)}$$

Since variance of original data $Var(X)$ is independent of solution, minimum of $loss(X,V_K)$ is equivalent to maximum of $\mbox{R}(X,V_K)$. For example, if $K=2$, then $V^*_2=\{\mbox{v}^*_1, \mbox{v}^*_2\}$ minimizes $loss(X,V_2)$ and equivalently maximizes $\mbox{R}(X,V_2)$. Ideally, if original data $X$ can be perfectly reconstructed from $V_K$, then $R(X, V_K)$ would be $1$.

Does maximum variance means most information about my data in higher dimension is captured into lower dimension?

Yes. If we agree that "keep as much information as possible" is equivalent to "be able to reconstruct the data as close as possible", then our objective $min_{V_K}loss(X,V_K)$ formalizes "keep as much information as possible", and its solution is "maximum variance".

score 1 · Answer 2 · answered Mar 09 '19 at 00:02

In addition to what has been said:

Why do we choose principal components based on maximum variance explained?

- Because the variance left by rest of the components is in fact the residual you want to minimize when looking for the best representation of your data in less dimensions (the best mean-square linear representation, of course).

do we always have to choose principal components based on maximum variance explained?

- Yes, if dimensionality reduction is what you want.

However, there are applications when the residual components are those who tell the story :-)

Why do we choose principal components based on maximum variance explained?

2 Answers2

Linked