Highest Voted 'gensim' Questions - Data Science Stack Exchange

35

votes

6 answers

How do I load FastText pretrained model with Gensim?

I tried to load fastText pretrained model from here Fasttext model. I am using wiki.simple.en from gensim.models.keyedvectors import KeyedVectors word_vectors = KeyedVectors.load_word2vec_format('wiki.simple.bin', binary=True) But, it shows the…

nlp gensim

asked Jun 30 '17 at 02:14

Sabbiu Shah

733
1
6
9

21

votes

4 answers

How to initialize a new word2vec model with pre-trained model weights?

I am using Gensim Library in python for using and training word2vector model. Recently, I was looking at initializing my model weights with some pre-trained word2vec model such as (GoogleNewDataset pretrained model). I have been struggling with it…

python nlp word-embeddings word2vec gensim

asked Mar 14 '16 at 09:47

Nomiluks

461
1
4
9

17

votes

3 answers

Word2Vec how to choose the embedding size parameter

I'm running word2vec over collection of documents. I understand that the size of the model is the number of dimensions of the vector space that the word is embedded into. And that different dimensions are somewhat related to different, independent…

python nlp word2vec gensim

asked May 04 '19 at 23:29

Neil

257
1
2
8

16

votes

5 answers

Number of epochs in Gensim Word2Vec implementation

There's an iter parameter in the gensim Word2Vec implementation class gensim.models.word2vec.Word2Vec(sentences=None, size=100, alpha=0.025, window=5, min_count=5, max_vocab_size=None, sample=0, seed=1, workers=1, min_alpha=0.0001, sg=1, hs=1,…

gensim word2vec convergence

asked Jan 17 '16 at 13:14

alvas

2,340
6
25
38

16

votes

3 answers

Doc2vec(gensim) - How can I infer unseen sentences’ label?

https://radimrehurek.com/gensim/models/doc2vec.html For example, if we have trained doc2vec with "aaaaaAAAAAaaaaaa" - "label 1" “bbbbbbBBBBBbbbb" - "label 2" can we infer “aaaaAAAAaaaaAA” is label 1 using Doc2vec? I know Doc2vec can train word…

gensim

asked Mar 09 '16 at 08:37

Seongho

163
1
6

8

votes

1 answer

Difference between Gensim word2vec and keras Embedding layer

I used the gensim word2vec package and Keras Embedding layer for various different projects. Then I realize they seem to do the same thing, they all try to convert a word into a feature vector. Am I understanding this properly? What exactly is the…

keras word2vec word-embeddings gensim embeddings

asked Oct 11 '19 at 13:25

Edamame

2,705
5
23
32

8

votes

1 answer

Gensim LDA model: return keywords based on relevance (λ - lambda) value

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on…

python topic-model lda gensim

asked Aug 21 '19 at 17:40

Tasos Lytos

81
3

5

votes

2 answers

Why is averaging the vectors required in word2vec?

While implementing word2vec using gensim by following few tutorials online, one thing that I couldn't understand is the reason why word vectors are averaged once the model is trained. Few example links…

word-embeddings word2vec gensim

asked Apr 19 '21 at 11:50

mockash

163
5

5

votes

1 answer

How to choose threshold for gensim Phrases when generating bigrams?

I'm generating bigrams with from gensim.models.phrases, which I'll use downstream with TF-IDF and/or gensim.LDA from gensim.models.phrases import Phrases, Phraser # 7k documents, ~500-1k tokens each. Already ran cleanup, stop_words, lemmatization,…

nlp text-mining lda gensim

asked Aug 14 '20 at 21:05

lefnire

151
4

5

votes

4 answers

How to train an existing word2vec gensim model on new words?

According to gensim docs, you can take an existing word2vec model and further train it on new words. The training is streamed, meaning sentences can be a generator, reading input data from disk on the fly, without loading the entire corpus into…

python word2vec gensim

asked Apr 16 '19 at 21:21

tim_xyz

177
1
1
11

5

votes

2 answers

can I use public pretrained word2vec, and continue train it for domain specific text?

I have a set of reviews from apparel domain, about 100K reviews (2M words). And I want to train word2vec to do some cool NLP staff with it. However the size is not enough for creating adequate word2vec model, it requires billions of words. So the…

nlp word2vec gensim

asked Aug 21 '18 at 13:02

Ilia Kandrashou

51
1
3

5

votes

1 answer

Doc2vec to calculate cosine similarity - absolutely inaccurate

I'm trying to modify the Doc2vec tutorial to calculate cosine similarity and take Pandas dataframes instead of .txt documents. I want to find the most similar sentence to a new sentence I put in from my data. However, after training, even if I give…

python nlp similarity text gensim

asked Nov 06 '17 at 11:03

lte__

1,310
5
18
26

4

votes

2 answers

Does spaCy support multiple GPUs?

I was wondering if spaCy supports multi-GPU via mpi4py? I am currently using spaCy's nlp.pipe for Named Entity Recognition on a high-performance-computing cluster that supports the MPI protocol and has many GPUs. It says here that I would need to…

python nlp gensim spacy hpc

asked Jul 21 '21 at 10:56

Jinhua Wang

163
8

4

votes

1 answer

Predicting the missing word using fasttext pretrained word embedding models (CBOW vs skipgram)

I am trying to implement a simple word prediction algorithm for filling a gap in a sentence by choosing from several options: Driving a ---- is not fun in London streets. Apple Car Book King With the right model in place: Question 1. What…

machine-learning nlp prediction word2vec gensim

asked Mar 22 '20 at 14:00

Kingstar

53
5

4

votes

1 answer

word2vec word embeddings creates very distant vectors, closest cosine similarity is still very far, only 0.7

I started using gensim's FastText to create word embeddings on a large corpus of a specialized domain (after finding that existing open source embeddings are not performing well on this domain), although I'm not using its character level n-grams, so…

word2vec word-embeddings gensim embeddings cosine-distance

asked May 31 '19 at 10:35

Oren Matar

221
1
7

Questions tagged [gensim]