Questions tagged [pca]

Principal component analysis, a technique for dimensionality reduction.

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

336 questions

votes

4 answers

How do I make an interactive PCA scatterplot in Python?

The matplotlib library is very capable but lacks interactiveness, especially inside Jupyter Notebook. I would like a good offline plotting tool like plot.ly.

asked May 28 '16 at 00:55

scottlittle

votes

2 answers

How many dimensions to reduce to when doing PCA?

How to choose K for PCA? K is the number of dimensions to project down to. The only requirement is to not lose too much information. I understand it depends on the data, but I'm looking more for a simple general overview about what characteristics…

pca

asked Mar 16 '16 at 04:28

pr338

votes

4 answers

Is PCA considered a machine learning algorithm

I've understood that principal component analysis is a dimensionality reduction technique i.e. given 10 input features, it will produce a smaller number of independent features that are orthogonal and linear transformation of original features. Is…

machine-learning pca

asked Jan 16 '18 at 20:42

Victor

votes

4 answers

Classify multivariate time series

I have a set of data composed of time series (8 points) with about 40 dimensions (so each time series is 8 by 40). The corresponding ouput (the possible outcomes for the categories ) is eitheir 0 or 1. What would be the best approach to design a…

classification time-series pca

asked May 09 '17 at 08:33

AugBar

votes

1 answer

Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?

Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting…

machine-learning pca hyperparameter

asked Mar 27 '19 at 18:58

J. Doe

votes

2 answers

Does it make sense to combine PCA with an artificial neural network?

I have a Dataset of around 200 features. Most of them are categorical and only a few are numerical. It seems that an artificial neural network with an Autoencoder has some problems with that kind and amount of features. Therefore, I thought to use…

machine-learning neural-network deep-learning predictive-modeling pca

asked Jan 16 '18 at 10:03

Rene B.

votes

3 answers

Why does PCA assume Gaussian Distribution?

From Jon Shlens's A Tutorial on Principal Component Analysis - version 1, page 7, section 4.5, II: The formalism of sufficient statistics captures the notion that the mean and the variance entirely describe a probability distribution. The only…

pca gaussian

asked Dec 19 '17 at 06:08

Math J

votes

2 answers

Understanding how distributed PCA works

As part of big data analysis project, I'm working on, I need to perform PCA on some data, using cloud computing system. In my case, I'm using Amazon EMR for the job and Spark in particular. Leaving the "How-to-perform-PCA-in-spark" question aside, I…

data-mining bigdata apache-spark pca distributed

asked Apr 19 '17 at 08:58

Adiel

votes

1 answer

Interpreting the results of randomized PCA in scikit-learn

I'm using scikit-learn to do a genome-wide association study with a feature vector of about 100K SNPs. My goal is to tell the biologists which SNPs are "interesting". RandomizedPCA really improved my models, but I'm having trouble interpreting the…

feature-selection scikit-learn pca randomized-algorithms

asked Mar 05 '16 at 19:07

retsreg

votes

3 answers

Should I use keras or sklearn for PCA?

Recentl I saw that there is some basic overlapping of functionality between keras and sklearn regarding data preprocessing. So I am a bit confused that whether should I introduce a dependency on another library like sklearn for basic data…

deep-learning keras scikit-learn feature-engineering pca

asked Jun 19 '20 at 05:34

Shrijit Basak

votes

1 answer

Sklearn PCA with zero components example

I'm simply trying to repeat a benchmark from the sklearn's docs. The unclear part is: n_components = np.arange(0, n_features, 5). They are applying a PCA transform with 0 components! Can somebody, please, explain, what's the mathematical meaning of…

scikit-learn pca

asked May 18 '18 at 11:07

Ladenkov Vladislav

votes

1 answer

Theoretical differences between KPCA and t-SNE?

I (think I) understand the underlying principles of most dimensionality reduction methods (MDS, IsoMap, t-SNE, Spectral Embedding, Diffusion maps, etc...). Some of the algorithms I use the most are Kernel PCA (with a gaussian kernel) and t-SNE. My…

visualization pca dimensionality-reduction kernel tsne

asked Jul 29 '21 at 15:43

Rayamon

votes

1 answer

How do I interpret my result of clustering?

I am working on a clustering problem. I have 11 features. My complete data frame has 70-80% zeros. The data had outliers that I capped at 0.5 and 0.95 percentile. However, I tried k-means (python) on data and received a very unusual cluster that…

data-mining clustering unsupervised-learning k-means pca

asked Jan 24 '20 at 20:17

Akash Dubey

votes

1 answer

Intuition behind PCA eigenvectors

For undergraduate students who understand the definition of eigenvectors and eigenvalues, $$A v = \lambda v \;,$$ what is the intuition behind why the eigenvectors of the covariance (or correlation) matrix correspond to the axes of maximal…

pca

asked Oct 22 '19 at 23:52

Joseph O'Rourke

votes

1 answer

How to use PCA in CNN for image recognition using Keras?

I created a CNN model for image classification and I want to use Principal Component Analysis (PCA) but when I run pca.fit() code, the code still running for hours and the RAM become full. So, I want to know how to use PCA in CNN for image…

deep-learning keras tensorflow pca convolutional-neural-network

asked Jul 31 '19 at 19:41

N.IT

1,975
4
17
35

2 3

…

22 23 Next