Word2Vec vs. Doc2Vec Word Vectors

Question

I am doing some analysis on document similarity and was also interested in word similarity. I know that doc2vec inherits from word2vec and by default trains using word vectors which we can access.

My question is:

Should we expect these word vectors and by association any of the methods such as most_similar to be 'better' than word2vec or are they essentially going to be the same? If in the future I only wanted word similarity should I just default to word2vec?

I think it might depend on the doc2vec implementation; which one are you using? — Darren Cook, Feb 08 '21 at 18:39

Brian Spiering · Answer 1 · 2021-05-22T00:12:32.367

1

If you only care about word similarity, then apply Occam's Razor and use word2vec. There is no need to increase model complexity if not going to be used.

Also, the quality of embeddings is primarily increased through the size and diversity of the corpus. The algorithm has a much smaller effect on the quality of the embedding.

edited May 22 '21 at 00:12

answered May 21 '21 at 14:56

Brian Spiering

20,142
2
25
102

I care about both, my main question was about if doc2vec word vectors would be better or basically the same than word2vec. Based on another answer on another forum they would basically be the same aside from some random initialization fuzziness. – Tylerr May 21 '21 at 15:18

Word2Vec vs. Doc2Vec Word Vectors

1 Answers1

Linked