Highest Voted Questions - Data Science Stack Exchange

33

votes

5 answers

How can I get a measure of the semantic similarity of words?

What is the best way to figure out the semantic similarity of words? Word2Vec is okay, but not ideal: # Using the 840B word Common Crawl GloVe vectors with gensim: # 'hot' is closer to 'cold' than 'warm' In [7]: model.similarity('hot',…

nlp word-embeddings word2vec nltk

asked Jul 19 '16 at 21:54

Thomas Johnson

665
1
7
11

32

votes

3 answers

Is it necessary to normalize data for XGBoost?

MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…

decision-trees xgboost normalization

asked Sep 28 '19 at 13:35

user781486

1,305
2
16
18

32

votes

6 answers

In a Transformer model, why does one sum positional encoding to the embedding rather than concatenate it?

While reviewing the Transformer architecture, I realized something I didn't expect, which is that : the positional encoding is summed to the word embeddings rather than concatenated to…

nlp encoding transformer attention-mechanism

asked Jul 18 '19 at 08:34

FremyCompany

433
4
7

32

votes

4 answers

Role derivative of sigmoid function in neural networks

I try to understand role of derivative of sigmoid function in neural networks. First I plot sigmoid function, and derivative of all points from definition using python. What is the role of this derivative exactly? import numpy as np import…

machine-learning neural-network

asked Apr 23 '18 at 09:38

lukassz

467
1
5
10

32

votes

4 answers

When to use cosine simlarity over Euclidean similarity

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean? Overview of the task set: The task is to compute…

machine-learning nlp clustering similarity

asked Feb 12 '18 at 13:31

Logan

443
1
4
8

32

votes

2 answers

Are there any rules for choosing the size of a mini-batch?

When training neural networks, one hyperparameter is the size of a minibatch. Common choices are 32, 64, and 128 elements per mini batch. Are there any rules/guidelines on how big a mini-batch should be? Or any publications which investigate the…

deep-learning neural-network convolutional-neural-network optimization

asked Apr 17 '17 at 16:18

Martin Thoma

18,630
31
92
167

31

votes

4 answers

What algorithms should I use to perform job classification based on resume data?

Note that I am doing everything in R. The problem goes as follow: Basically, I have a list of resumes (CVs). Some candidates will have work experience before and some don't. The goal here is to: based on the text on their CVs, I want to classify…

machine-learning classification nlp text-mining

asked Jul 03 '14 at 16:11

user1769197

431
1
5
5

31

votes

4 answers

Gumbel-Softmax trick vs Softmax with temperature

From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning). Many papers and articles describe it as a way…

neural-network deep-learning attention-mechanism softmax

asked Aug 29 '19 at 10:30

4-bit

411
1
4
3

31

votes

3 answers

General approach to extract key text from sentence (nlp)

Given a sentence like: Complimentary gym access for two for the length of stay ($12 value per person per day) What general approach can I take to identify the word gym or gym access?

machine-learning nlp text-mining data-cleaning

asked Mar 13 '15 at 16:41

William Falcon

421
1
6
7

31

votes

3 answers

What's the difference between Attention vs Self-Attention? What problems does each other solve that the other can't?

As stated in the question above..is there a difference between attention and self attention mechanism ? Also additionally can anybody share with me tips and tricks about how self attention mechanism can be implemented in CNN?

cnn attention-mechanism

asked Apr 17 '19 at 10:39

Pratik.S

443
1
4
9

31

votes

5 answers

Why underfitting is called high bias and overfitting is called high variance?

I have been using terms like underfitting/overfitting and bias-variance tradeoff for quite some while in data science discussions and I understand that underfitting is associated with high bias and over fitting is associated with high variance. But…

variance bias

asked Feb 14 '19 at 14:33

Vaibhav Thakur

2,333
3
11
9

31

votes

6 answers

Validation loss is not decreasing

I am trying to train a LSTM model. Is this model suffering from overfitting? Here is train and validation loss graph:

machine-learning neural-network regression lstm rnn

asked Dec 27 '18 at 08:23

DukeLover

561
1
6
14

31

votes

3 answers

How can I check the correlation between features and target variable?

I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables? This is my sample dataset Loan_ID Gender Married Dependents Education Self_Employed…

machine-learning scikit-learn regression linear-regression

asked Oct 03 '18 at 18:43

Jeeth

911
2
10
18

31

votes

3 answers

Keras Callback example for saving a model after every epoch?

Can someone please post a straightforward example of Keras using a callback to save a model after every epoch? I can find examples of saving weights, but I want to be able to save a completely functioning model after every training epoch.

python keras

asked Feb 22 '18 at 21:32

I_Play_With_Data

2,079
2
16
39

31

votes

2 answers

How to calculate the fold number (k-fold) in cross validation?

I am confused about how I choose the number of folds (in k-fold CV) when I apply cross validation to check the model. Is it dependent on data size or other parameters?

machine-learning python scikit-learn cross-validation

asked Feb 22 '18 at 05:23

Taimur Islam

901
4
11
17

Most Popular