Highest Voted Questions - Data Science Stack Exchange

39

votes

1 answer

Pearson vs Spearman vs Kendall

What are the characteristics of the three correlation coefficients and what are the comparisons of each of them/assumptions? Can somebody kindly take me through the concepts?

correlation pearsons-correlation-coefficient spearmans-rank-correlation kendalls-tau-coefficient

asked Dec 05 '19 at 11:33

user86099

39

votes

4 answers

Applications and differences for Jaccard similarity and Cosine Similarity

Jaccard similarity and cosine similarity are two very common measurements while comparing item similarities. However, I am not very clear in what situation which one should be preferable than another. Can somebody help clarify the differences of…

similarity

asked Feb 12 '15 at 07:08

shihpeng

553
1
4
8

39

votes

7 answers

Using TensorFlow with Intel GPU

Is there any way now to use TensorFlow with Intel GPUs? If yes, please point me in the right direction. If not, please let me know which framework, if any, (Keras, Theano, etc) can I use for my Intel Corporation Xeon E3-1200 v3/4th Gen Core…

tensorflow keras theano gpu

asked Mar 14 '17 at 17:42

James Bond

1,155
2
11
12

39

votes

1 answer

How does Keras calculate accuracy?

How does Keras calculate accuracy from the classwise probabilities? Say, for example we have 100 samples in the test set which can belong to one of two classes. We also have a list of the classwise probabilites. What threshold does Keras use to…

neural-network deep-learning keras

asked Oct 07 '16 at 08:10

pseudomonas

1,032
3
13
30

38

votes

5 answers

Data normalization before or after train-test split?

Which one is the right approach to make data normalization - before or after train-test split? Normalization before split from sklearn.preprocessing import StandardScaler normalized_X_features = pd.DataFrame( …

normalization

asked Jul 02 '19 at 12:21

Tauno

739
2
9
8

38

votes

5 answers

Best practices to store Python machine learning models

What are the best practices to save, store, and share machine learning models? In Python, we generally store the binary representation of the model, using pickle or joblib. Models, in my case, can be ~100Mo large. Also, joblib can save one model to…

python databases binary

asked Jun 18 '17 at 09:03

Antoine Dusséaux

481
1
4
7

38

votes

2 answers

Why use both validation set and test set?

Consider a neural network: For a given set of data, we divide it into training, validation and test set. Suppose we do it in the classic 60:20:20 ratio, then we prevent overfitting by validating the network by checking it on validation set. Then…

machine-learning neural-network cross-validation

asked Apr 13 '17 at 19:33

user1825567

1,336
1
12
22

38

votes

4 answers

What is the meaning of "The number of units in the LSTM cell"?

From Tensorflow code: Tensorflow. RnnCell. num_units: int, The number of units in the LSTM cell. I can't understand what this means. What are the units of LSTM cell? Input, Output and Forget gates? Does this mean "the number of units in the…

neural-network tensorflow rnn

asked Jul 24 '16 at 10:17

Brans Ds

849
1
8
17

38

votes

3 answers

Calculation and Visualization of Correlation Matrix with Pandas

I have a pandas data frame with several entries, and I want to calculate the correlation between the income of some type of stores. There are a number of stores with income data, classification of area of activity (theater, cloth stores, food ...)…

python statistics visualization pandas

asked Mar 01 '16 at 05:56

gdlm

535
1
6
9

37

votes

4 answers

Meaning of latent features?

I am learning about matrix factorization for recommender systems and I am seeing the term latent features occurring too frequently but I am unable to understand what it means. I know what a feature is but I don't understand the idea of latent…

machine-learning data-mining recommender-system

asked Jul 16 '14 at 09:24

Jack Twain

719
1
5
7

37

votes

5 answers

Are decision tree algorithms linear or nonlinear

Recently a friend of mine was asked whether decision tree algorithms are linear or nonlinear algorithms in an interview. I tried to look for answers to this question but couldn't find any satisfactory explanation. Can anyone answer and explain the…

machine-learning classification decision-trees algorithms pac-learning

asked Aug 13 '15 at 13:59

user2966197

511
1
6
8

37

votes

7 answers

How to get sentence embedding using BERT?

How to get sentence embedding using BERT? from transformers import BertTokenizer tokenizer=BertTokenizer.from_pretrained('bert-base-uncased') sentence='I really enjoyed this movie a lot.' #1.Tokenize the…

tensorflow nlp pytorch bert

asked Nov 04 '19 at 15:22

star

1,411
7
18
29

37

votes

5 answers

In the context of Deep Learning, what is training warmup steps

I found the term "training warmup steps" in some of the papers. What exactly does this term mean? Has it got anything to do with "learning rate"? If so, how does it affect it?

machine-learning deep-learning training

asked Jul 19 '19 at 10:10

Ashwin Geet D'Sa

1,049
1
9
19

37

votes

13 answers

What do you think of Data Science certifications?

I've now seen two data science certification programs - the John Hopkins one available at Coursera and the Cloudera one. I'm sure there are others out there. The John Hopkins set of classes is focused on R as a toolset, but covers a range of…

education

asked Jun 12 '14 at 10:52

Steve Kallestad

3,128
4
21
39

37

votes

2 answers

Keras difference beetween val_loss and loss during training

What is the difference between val_loss and loss during training in Keras? E.g. Epoch 1/20 1000/1000 [==============================] - 1s - loss: 0.1760, val_loss: 0.2032 On some sites I read that on validation, dropout was not working.

machine-learning deep-learning keras

asked Nov 30 '17 at 19:33

Vladimir Shebuniayeu

539
1
4
9

Most Popular