Word2Vec and Tf-idf how to combine them

Question

I'm currently working in text mining ptoject I'd like to know once I'm on vectorisation. With method is better.

Thanks

score 3 · Accepted Answer · answered Jan 30 '20 at 13:46

3

Word2Vec algorithms (Skip Gram and CBOW) treat each word equally, because their goal to compute word embeddings. The distinction becomes important when one needs to work with sentences or document embeddings; not all words equally represent the meaning of a particular sentence. And here different weighting strategies are applied, TF-IDF is one of those successful strategies.
At times, it does improve quality of inference, so combination is worth a shot.
Glove is a Stanford baby, which has often proved to perform better. Can read more about Glove against Word2Vec here, among many other resources available online.

answered Jan 30 '20 at 13:46

Random Nerd

when combining them I've to perfomr Wrd2vec then tf-idf? I do not know how that should word the output of word2vecis numeric matrix, how tf-idf should handle that?? – abdoulsn Jan 30 '20 at 13:48
1

Create a matrix of feature first! Can use Sklearn tfidfvectorizer (https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html). Here is an example on Kaggle kernel (just googled): https://www.kaggle.com/reiinakano/basic-nlp-bag-of-words-tf-idf-word2vec-lstm – Random Nerd Jan 30 '20 at 13:55

1 Answers1