Highest Voted 'doc2vec' Questions - Data Science Stack Exchange

3

votes

2 answers

Gensim doc2vec error: KeyError: "word 'senseless' not in vocabulary"

I am new to machine learning and tried doc2vec on quora duplicate dataset. new_dfx has columns 'question1' and 'question2' which has preprocessed questions in each row. Following is the tagged document sample: input: q_arr =…

asked Jan 13 '23 at 12:02

Ankit Rohilla

31
2

2

votes

2 answers

classification of similar text input features with text output label

I hope somebody can provide guidance/input/advice on my project, where I believe AI can help. I have a general understanding of AI, but I lack a formal training. I've never built a neural net from scratch on my own. Task Build a classification model…

keras nlp text-classification gensim doc2vec

asked Jun 09 '21 at 16:05

andrea

73
6

2

votes

0 answers

Preprocessing for Document Similarity Using Doc2Vec

I'm trying to determine document similarity using Doc2Vec on a large series of legal opinions, which can contain some highly jargonistic language and phrases (e.g. en banc, de novo, etc.). I'm wondering if anyone has any thoughts about the criteria…

similar-documents doc2vec

asked Jun 01 '21 at 19:18

user118648

21
1

2

votes

0 answers

What is the meaning of, or explanation for, having multiple tags in a Doc2Vec model's TaggedDocuments?

I've tried reading the other answers on this topic but I'm unsure if I understand completely. For my dataset, I have a series of tagged documents, "good" or "bad." Each document belongs to an entity, and each entity has a different number of…

python nlp word2vec doc2vec document-understanding

asked Mar 08 '21 at 16:06

Jayke

21
1

2

votes

1 answer

Word2Vec vs. Doc2Vec Word Vectors

I am doing some analysis on document similarity and was also interested in word similarity. I know that doc2vec inherits from word2vec and by default trains using word vectors which we can access. My question is: Should we expect these word vectors…

nlp word2vec doc2vec

asked Feb 02 '21 at 15:47

Tylerr

146
3

2

votes

1 answer

DBSCAN on textual and numerical columns

I have a dataset which has two columns: title price sentence1 12 sentence2 13 I have used doc2vec to convert the sentences into vectors of size 100 as below: LabeledSentence1 = gensim.models.doc2vec.TaggedDocument all_content = [] j=0 for…

clustering word-embeddings categorical-data dbscan doc2vec

asked Nov 05 '20 at 20:33

Jazz

420
1
5
15

2

votes

1 answer

How to implement LSTM using Doc2Vec vectors to get representation?

Hi all. I'm a newbie in ML. I read and found a paper about A Multi-Level Plagiarism Detection System Based on Deep Learning Algorithms and want to implement this model . But I can't find more about step-by-step guide to build it. How LSTM can make…

machine-learning lstm nlp text doc2vec

asked Apr 02 '20 at 21:21

Omasaka Opacha Revok

21
3

2

votes

1 answer

Approach to semantic similarity between documents

I was wondering what approach people would take, or point me in the right direction on this challenge I have set myself. I am pretty new at this, I have covered some area but want to expand my skillset. Say you have an abstract from a research…

nlp similarity cosine-distance doc2vec

asked Jan 08 '20 at 13:39

user5067291

151
2

2

votes

2 answers

How to examine if a Doc2Vec model is sufficiently trained?

I started experimenting with gensim's Doc2Vec for sentiment analysis. For the training of the embedding itself, I have seen examples using a reduced learning rate with a few 10s or even a few hundred epochs. However, there does not seem to be a…

word-embeddings word2vec gensim doc2vec

asked Nov 08 '21 at 05:01

Shan Dou

131
2

1

vote

1 answer

Embedding from Transformer-based model from paragraph or documnet (like Doc2Vec)

I have a set of data that contains the different lengths of sequences. On average the sequence length is 600. The dataset is like this: S1 = ['Walk','Eat','Going school','Eat','Watching movie','Walk'......,'Sleep'] S2 = ['Eat','Eat','Going…

nlp bert transformer embeddings doc2vec

asked Apr 22 '21 at 18:47

Bloodstone Programmer

300
2
3
9

1

vote

1 answer

Clustering using both text and numerical features

I have a dataset that contains 2 types of features, one is generated from doc2vec and one is numerical feature. I would like to perform clustering analysis on them. However, due to the size of doc2vec features, if I simply combine them into one…

machine-learning clustering feature-engineering unsupervised-learning doc2vec

asked Jan 21 '21 at 12:25

E.TTT

11
1

1

vote

0 answers

doc2vec - paragraph or article as document

I'm trying to train a doc2vec model on the German wiki corpus. While looking for the best practice I've found different possibilities on how to create the training data. Should I split every Wikipedia article by each natural paragraph into several…

nlp gensim doc2vec wikipedia

asked Jan 09 '21 at 13:46

jonas

143
4

1

vote

0 answers

Document Similarity to List of Words in Sentiment Analysis

How would you go about finding document similarity to a list of words in Sentiment Analysis? Looking find document similarity to multiple lists of words in sentiment analysis. I had been working on this with my intern but he is sorting by sentiment…

nlp similar-documents doc2vec

asked Aug 17 '20 at 17:15

JohnT

111
5

1

vote

1 answer

Topic alignment / topic modelling

What is the most efficient method for detecting whether the article is mostly about a specific topic, but without lots of data for training? My task is to determine how much a document is e.g. about the weather or holidays or several other specific…

word2vec topic-model tfidf lda doc2vec

asked Apr 23 '20 at 23:12

piernik

51
2

1

vote

0 answers

T-SNE good clustering but SVM classification poor

I am trying to classify in 4 different classes, paragraph embedding vector computed with doc2vec using an non-linear svm over them. When I visualize the embeddings using tensorboard t-sne I can see that they are clustered quite well as in the…

scikit-learn clustering svm word2vec doc2vec

asked Mar 26 '20 at 15:46

Luca Massarelli

11
1

Questions tagged [doc2vec]