Highest Voted 'dimensionality-reduction' Questions

70

votes

11 answers

What is dimensionality reduction? What is the difference between feature selection and extraction?

From wikipedia: dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. What is the difference between feature…

feature-selection feature-extraction dimensionality-reduction

asked May 18 '14 at 06:26

alvas

2,340
6
25
38

36

votes

6 answers

How to do SVD and PCA with big data?

I have a large set of data (about 8GB). I would like to use machine learning to analyze it. So, I think that I should use SVD then PCA to reduce the data dimensionality for efficiency. However, MATLAB and Octave cannot load such a large…

bigdata data-mining dimensionality-reduction

asked Sep 25 '14 at 08:40

David S.

547
2
6
8

30

votes

8 answers

Purpose of visualizing high dimensional data?

There are many techniques for visualizing high dimension datasets, such as T-SNE, isomap, PCA, supervised PCA, etc. And we go through the motions of projecting the data down to a 2D or 3D space, so we have a "pretty pictures". Some of these…

machine-learning dimensionality-reduction visualization

asked Nov 26 '15 at 04:28

hlin117

675
1
8
11

28

votes

6 answers

Machine learning techniques for estimating users' age based on Facebook sites they like

I have a database from my Facebook application and I am trying to use machine learning to estimate users' age based on what Facebook sites they like. There are three crucial characteristics of my database: the age distribution in my training set…

machine-learning dimensionality-reduction python

asked May 17 '14 at 09:16

Wojciech Walczak

916
12
23

28

votes

5 answers

Improve the speed of t-sne implementation in python for huge data

I would like to do dimensionality reduction on nearly 1 million vectors each with 200 dimensions(doc2vec). I am using TSNE implementation from sklearn.manifold module for it and the major problem is time complexity. Even with method = barnes_hut,…

python bigdata nlp scikit-learn dimensionality-reduction

asked Feb 06 '16 at 14:19

chmodsss

1,954
2
17
37

23

votes

3 answers

Nearest neighbors search for very high dimensional data

I have a big sparse matrix of users and items they like (in the order of 1M users and 100K items, with a very low level of sparsity). I'm exploring ways in which I could perform kNN search on it. Given the size of my dataset and some initial tests I…

machine-learning distributed map-reduce dimensionality-reduction

asked Aug 14 '14 at 00:50

cjauvin

451
3
7

22

votes

4 answers

Dimensionality and Manifold

A commonly heard sentence in unsupervised Machine learning is High dimensional inputs typically live on or near a low dimensional manifold What is a dimension? What is a manifold? What is the difference? Can you give an example to describe…

machine-learning dimensionality-reduction

asked May 05 '15 at 17:48

alvas

2,340
6
25
38

21

votes

5 answers

Feature selection vs Feature extraction. Which to use when?

Feature extraction and feature selection essentially reduce the dimensionality of the data, but feature extraction also makes the data more separable, if I am right. Which technique would be preferred over the other and when? I was thinking,…

feature-selection feature-extraction dimensionality-reduction

asked Mar 13 '18 at 05:32

Sid

667
1
5
14

20

votes

3 answers

Why are autoencoders for dimension reduction symmetrical?

I'm not an expert in autoencoders or neural networks by any means, so forgive me if this is a silly question. For the purpose of dimension reduction or visualizing clusters in high dimensional data, we can use an autoencoder to create a (lossy) 2…

neural-network dimensionality-reduction autoencoder

asked Oct 13 '17 at 05:25

dcl

341
2
6

20

votes

1 answer

Are t-sne dimensions meaningful?

Are there any meanings for the dimensions of a t-sne embedding? Like with PCA we have this sense of linearly transformed variance maximizations but for t-sne is there intuition besides just the space we define for mapping and minimization of the…

dimensionality-reduction tsne

asked Mar 02 '17 at 16:46

Nitro

407
3
9

18

votes

4 answers

One hot encoding alternatives for large categorical values

I have a data frame with large categorical values over 1600 categories. Is there any way I can find alternatives so that I don't have over 1600 columns? I found this interesting link. But they are converting to class/object which I don't want. I…

machine-learning dataset dataframe dimensionality-reduction encoding

asked Nov 14 '17 at 17:20

vinaykva

283
1
2
7

17

votes

2 answers

High-dimensional data: What are useful techniques to know?

Due to various curses of dimensionality, the accuracy and speed of many of the common predictive techniques degrade on high dimensional data. What are some of the most useful techniques/tricks/heuristics that help deal with high-dimensional data…

machine-learning statistics dimensionality-reduction

asked Jan 25 '15 at 22:52

ASX

451
2
4
7

15

votes

1 answer

Can closer points be considered more similar in T-SNE visualization?

I understand from Hinton's paper that T-SNE does a good job in keeping local similarities and a decent job in preserving global structure (clusterization). However I'm not clear if points appearing closer in a 2D t-sne visualization can be assumed…

visualization dimensionality-reduction tsne manifold

asked Mar 20 '16 at 16:11

Javierfdr

1,490
12
14

14

votes

2 answers

Efficient dimensionality reduction for large dataset

I have a dataset with ~1M rows and ~500K sparse features. I want to reduce the dimensionality to somewhere in the order of 1K-5K dense features. sklearn.decomposition.PCA doesn't work on sparse data, and I've tried using…

python scikit-learn dimensionality-reduction

asked Aug 29 '18 at 11:35

timleathart

3,900
20
35

10

votes

2 answers

Reducing the dimensionality of word embeddings

I trained word embeddings with 300 dimensions. Now, I would like to have word embeddings with 50 dimensions: is it better to retrain the word embeddings with 50 dimensions, or can I use some dimensionality reduction method to scale the word…

nlp dimensionality-reduction word-embeddings

asked Jul 28 '15 at 17:54

Franck Dernoncourt

5,573
9
40
75

Questions tagged [dimensionality-reduction]