1

I am following the Coursera NLP specialization, and in particular the lab "Another explanation about PCA" in Course 1 Week 3.

From the lab, I recovered the following code. It creates 2 random variables, rotates them to make them dependent and correlated, and then run PCA on them:

import numpy as np
from sklearn.decomposition import PCA 
import math

std1 = 1     # The desired standard deviation of our first random variable
std2 = 0.333 # The desired standard deviation of our second random variable

x = np.random.normal(0, std1, 1000) # Get 1000 samples from x ~ N(0, std1)
y = np.random.normal(0, std2, 1000)  # Get 1000 samples from y ~ N(0, std2)

# PCA works better if the data is centered
x = x - np.mean(x) # Center x 
y = y - np.mean(y) # Center y

#Define a pair of dependent variables with a desired amount of covariance
n = 1 # Magnitude of covariance. 
angle = np.arctan(1 / n) # Convert the covariance to and angle
print('angle: ',  angle * 180 / math.pi)

# Create a rotation matrix using the given angle
rotationMatrix = np.array([[np.cos(angle), np.sin(angle)],
                 [-np.sin(angle), np.cos(angle)]])

# Create a matrix with columns x and y
xy = np.concatenate(([x], [y]), axis=0).T

# Get covariance matrix of xy
print("Covariance matrix of xy")
covmat = np.cov(xy, rowvar=False)
print(f"{np.sqrt(covmat[0,0]):.3f} = {std1}")
print(f"{np.sqrt(covmat[1,1]):.3f} = {std2}")

# Transform the data using the rotation matrix. It correlates the two variables
data = np.dot(xy, rotationMatrix)

# Get covariance matrix of data
print("Covariance matrix of data")
covmat = np.cov(data, rowvar=False)
print(f"{np.sqrt(covmat[0,0]):.3f} = {std1}")
print(f"{np.sqrt(covmat[1,1]):.3f} = {std2}")
print(f"{covmat[0,1]:.3f} = {n}")

# Apply PCA. 
pcaTr = PCA(n_components=2).fit(data)

# In theory, the Eigenvector matrix must be the 
# inverse of the original rotationMatrix. 
print("** These two matrices should be equal **")
print("Eigenvector matrix")
print(pcaTr.components_)
print("Inverse of original rotation matrix")
print(np.linalg.inv(rotationMatrix))

I get the following output:

angle:  45.0
Covariance matrix of xy
1.031 = 1
0.325 = 0.333
Covariance matrix of data
0.764 = 1
0.765 = 0.333
0.479 = 1
** This two matrices should be equal **
Eigenvector matrix
[[ 0.70632393  0.70788877]
 [ 0.70788877 -0.70632393]]
Inverse of original rotation matrix
[[ 0.70710678  0.70710678]
 [-0.70710678  0.70710678]]
  1. Why does n=1 define the magnitude of the covariance?
  2. Why we can obtain the angle between the variables as angle = np.arctan(1 / n)?
  3. Why I don't achieve this covariance between the variable when I take element (0,1) of the covariance matrix (the second time that I run np.cov, so the line covmat = np.cov(data, rowvar=False)?
  4. Why does the rotation change the variables variance from the initial std1 and std2?
  5. Why "In theory, the Eigenvector matrix must be the inverse of the original rotationMatrix" ?
  6. Why this is not the case?
  • I have noticed that the signs of the eigenvectors matrix (`print(pcaTr.components_)`) are random. Every time I run the script the signs of the eigenvector matrix change. Why is that? – robertspierre Dec 27 '20 at 13:41
  • I think this question although asks multiple questions is focused in that all the questions asked are different aspects of the basic question of what PCA means for a practical example. So I think this question should be re-opened. – Nikos M. Jan 03 '21 at 08:15

1 Answers1

1

Having 2 uncorrelated normal variables $x_1 \sim N(0, 1)$ and $x_2 \sim N(0, 1)$, one can correlate them through a linear transformation:

$$A = \begin{bmatrix}a & b \\ c & d\end{bmatrix}$$

$$X' = AX$$

The covariance matrix of the transformed correlated variables $X'$ is given by:

$$\Sigma' = A A^T$$

For a rotation by an angle $\theta$, and scaling for adding individual standard deviations, the transform is (sign is same as that given):

$$A = \begin{bmatrix}cos(\theta) & sin(\theta) \\ -sin(\theta) & cos(\theta)\end{bmatrix} \begin{bmatrix}\sigma_1 & 0 \\ 0 & \sigma_2\end{bmatrix}$$

This produces the covariance matrix $\Sigma'$ as:

$$\Sigma' = \begin{bmatrix}\sigma_1cos(\theta) & \sigma_2sin(\theta) \\ -\sigma_1sin(\theta) & \sigma_2cos(\theta)\end{bmatrix} \begin{bmatrix}\sigma_1cos(\theta) & -\sigma_1sin(\theta) \\ \sigma_2sin(\theta) & \sigma_2cos(\theta)\end{bmatrix}$$

The cross-variance component is given by:

$$\Sigma'_{12} = \sigma_1\sigma_2(cos^2(\theta)-sin^2(\theta))+(\sigma_2^2-\sigma_1^2)cos(\theta)sin(\theta)$$

This is the relation of cross-variance on angle $\theta$. Setting $\theta = 45^\circ$ we get:

$$\Sigma'_{12} = \frac{\sigma_2^2-\sigma_1^2}{2}$$

Setting $\sigma_1 = 0.333$ and $\sigma_2 = 1$ then $\Sigma'_{12} = 0.445$

Note $n = \frac{1}{tan(\theta)} \ne \Sigma'_{12}$ does not represent the covariance component as shown by analysis above.

PCA decomposes the covariance matrix into un-correlated components. In other words it undoes the rotation (the scaling remains), since the rotation correlated the components in the first place.

Explicitly PCA decomposes the covariance matrix into:

$$\Sigma' = U S U^T$$

where $U$ is an orthogonal matrix of eigen-vectors and $S$ is a diagonal matrix which represents individual variances.

Going back to $\Sigma' = A A^T$ and representing the rotation by $R$ we have:

$$\Sigma' = R \begin{bmatrix}\sigma_1^2 & 0 \\ 0 & \sigma_2^2\end{bmatrix} R^T$$

Thus the orthogonal matrix of eigen-vectors $U$ corresponds to the (orthogonal) rotation matrix $R$.

Thus the eigen-vectors should (precisely in theory, approximately in practise) correspond to the rotation matrix (up to some sign factor and permutation which remains arbitrary).

For further info see:

  1. Bivariate normal distribution
  2. Understanding the Covariance Matrix
  3. Interesting Properties of the Covariance Matrix
Nikos M.
  • 2,301
  • 1
  • 6
  • 11