Most Popular

1500 questions
33
votes
5 answers

How can I get a measure of the semantic similarity of words?

What is the best way to figure out the semantic similarity of words? Word2Vec is okay, but not ideal: # Using the 840B word Common Crawl GloVe vectors with gensim: # 'hot' is closer to 'cold' than 'warm' In [7]: model.similarity('hot',…
Thomas Johnson
  • 665
  • 1
  • 7
  • 11
32
votes
3 answers

Is it necessary to normalize data for XGBoost?

MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…
user781486
  • 1,305
  • 2
  • 16
  • 18
32
votes
6 answers

In a Transformer model, why does one sum positional encoding to the embedding rather than concatenate it?

While reviewing the Transformer architecture, I realized something I didn't expect, which is that : the positional encoding is summed to the word embeddings rather than concatenated to…
FremyCompany
  • 433
  • 4
  • 7
32
votes
4 answers

Role derivative of sigmoid function in neural networks

I try to understand role of derivative of sigmoid function in neural networks. First I plot sigmoid function, and derivative of all points from definition using python. What is the role of this derivative exactly? import numpy as np import…
lukassz
  • 467
  • 1
  • 5
  • 10
32
votes
4 answers

When to use cosine simlarity over Euclidean similarity

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean? Overview of the task set: The task is to compute…
Logan
  • 443
  • 1
  • 4
  • 8
32
votes
2 answers

Are there any rules for choosing the size of a mini-batch?

When training neural networks, one hyperparameter is the size of a minibatch. Common choices are 32, 64, and 128 elements per mini batch. Are there any rules/guidelines on how big a mini-batch should be? Or any publications which investigate the…
31
votes
4 answers

What algorithms should I use to perform job classification based on resume data?

Note that I am doing everything in R. The problem goes as follow: Basically, I have a list of resumes (CVs). Some candidates will have work experience before and some don't. The goal here is to: based on the text on their CVs, I want to classify…
user1769197
  • 431
  • 1
  • 5
  • 5
31
votes
4 answers

Gumbel-Softmax trick vs Softmax with temperature

From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning). Many papers and articles describe it as a way…
4-bit
  • 411
  • 1
  • 4
  • 3
31
votes
3 answers

General approach to extract key text from sentence (nlp)

Given a sentence like: Complimentary gym access for two for the length of stay ($12 value per person per day) What general approach can I take to identify the word gym or gym access?
William Falcon
  • 421
  • 1
  • 6
  • 7
31
votes
3 answers

What's the difference between Attention vs Self-Attention? What problems does each other solve that the other can't?

As stated in the question above..is there a difference between attention and self attention mechanism ? Also additionally can anybody share with me tips and tricks about how self attention mechanism can be implemented in CNN?
Pratik.S
  • 443
  • 1
  • 4
  • 9
31
votes
5 answers

Why underfitting is called high bias and overfitting is called high variance?

I have been using terms like underfitting/overfitting and bias-variance tradeoff for quite some while in data science discussions and I understand that underfitting is associated with high bias and over fitting is associated with high variance. But…
Vaibhav Thakur
  • 2,333
  • 3
  • 11
  • 9
31
votes
6 answers

Validation loss is not decreasing

I am trying to train a LSTM model. Is this model suffering from overfitting? Here is train and validation loss graph:
DukeLover
  • 561
  • 1
  • 6
  • 14
31
votes
3 answers

How can I check the correlation between features and target variable?

I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables? This is my sample dataset Loan_ID Gender Married Dependents Education Self_Employed…
Jeeth
  • 911
  • 2
  • 10
  • 18
31
votes
3 answers

Keras Callback example for saving a model after every epoch?

Can someone please post a straightforward example of Keras using a callback to save a model after every epoch? I can find examples of saving weights, but I want to be able to save a completely functioning model after every training epoch.
I_Play_With_Data
  • 2,079
  • 2
  • 16
  • 39
31
votes
2 answers

How to calculate the fold number (k-fold) in cross validation?

I am confused about how I choose the number of folds (in k-fold CV) when I apply cross validation to check the model. Is it dependent on data size or other parameters?
Taimur Islam
  • 901
  • 4
  • 11
  • 17