Questions tagged [pca]

Principal component analysis, a technique for dimensionality reduction.

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

336 questions
13
votes
4 answers

How do I make an interactive PCA scatterplot in Python?

The matplotlib library is very capable but lacks interactiveness, especially inside Jupyter Notebook. I would like a good offline plotting tool like plot.ly.
scottlittle
  • 330
  • 2
  • 13
13
votes
2 answers

How many dimensions to reduce to when doing PCA?

How to choose K for PCA? K is the number of dimensions to project down to. The only requirement is to not lose too much information. I understand it depends on the data, but I'm looking more for a simple general overview about what characteristics…
pr338
  • 385
  • 2
  • 7
12
votes
4 answers

Is PCA considered a machine learning algorithm

I've understood that principal component analysis is a dimensionality reduction technique i.e. given 10 input features, it will produce a smaller number of independent features that are orthogonal and linear transformation of original features. Is…
Victor
  • 591
  • 3
  • 7
  • 19
10
votes
4 answers

Classify multivariate time series

I have a set of data composed of time series (8 points) with about 40 dimensions (so each time series is 8 by 40). The corresponding ouput (the possible outcomes for the categories ) is eitheir 0 or 1. What would be the best approach to design a…
AugBar
  • 203
  • 1
  • 2
  • 8
8
votes
1 answer

Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?

Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting…
J. Doe
  • 81
  • 1
  • 2
8
votes
2 answers

Does it make sense to combine PCA with an artificial neural network?

I have a Dataset of around 200 features. Most of them are categorical and only a few are numerical. It seems that an artificial neural network with an Autoencoder has some problems with that kind and amount of features. Therefore, I thought to use…
8
votes
3 answers

Why does PCA assume Gaussian Distribution?

From Jon Shlens's A Tutorial on Principal Component Analysis - version 1, page 7, section 4.5, II: The formalism of sufficient statistics captures the notion that the mean and the variance entirely describe a probability distribution. The only…
Math J
  • 127
  • 1
  • 1
  • 4
8
votes
2 answers

Understanding how distributed PCA works

As part of big data analysis project, I'm working on, I need to perform PCA on some data, using cloud computing system. In my case, I'm using Amazon EMR for the job and Spark in particular. Leaving the "How-to-perform-PCA-in-spark" question aside, I…
Adiel
  • 183
  • 3
7
votes
1 answer

Interpreting the results of randomized PCA in scikit-learn

I'm using scikit-learn to do a genome-wide association study with a feature vector of about 100K SNPs. My goal is to tell the biologists which SNPs are "interesting". RandomizedPCA really improved my models, but I'm having trouble interpreting the…
6
votes
3 answers

Should I use keras or sklearn for PCA?

Recentl I saw that there is some basic overlapping of functionality between keras and sklearn regarding data preprocessing. So I am a bit confused that whether should I introduce a dependency on another library like sklearn for basic data…
6
votes
1 answer

Sklearn PCA with zero components example

I'm simply trying to repeat a benchmark from the sklearn's docs. The unclear part is: n_components = np.arange(0, n_features, 5). They are applying a PCA transform with 0 components! Can somebody, please, explain, what's the mathematical meaning of…
5
votes
1 answer

Theoretical differences between KPCA and t-SNE?

I (think I) understand the underlying principles of most dimensionality reduction methods (MDS, IsoMap, t-SNE, Spectral Embedding, Diffusion maps, etc...). Some of the algorithms I use the most are Kernel PCA (with a gaussian kernel) and t-SNE. My…
5
votes
1 answer

How do I interpret my result of clustering?

I am working on a clustering problem. I have 11 features. My complete data frame has 70-80% zeros. The data had outliers that I capped at 0.5 and 0.95 percentile. However, I tried k-means (python) on data and received a very unusual cluster that…
Akash Dubey
  • 676
  • 2
  • 5
  • 16
5
votes
1 answer

Intuition behind PCA eigenvectors

For undergraduate students who understand the definition of eigenvectors and eigenvalues, $$A v = \lambda v \;,$$ what is the intuition behind why the eigenvectors of the covariance (or correlation) matrix correspond to the axes of maximal…
5
votes
1 answer

How to use PCA in CNN for image recognition using Keras?

I created a CNN model for image classification and I want to use Principal Component Analysis (PCA) but when I run pca.fit() code, the code still running for hours and the RAM become full. So, I want to know how to use PCA in CNN for image…
N.IT
  • 1,975
  • 4
  • 17
  • 35
1
2 3
22 23