2

I am doing some analysis on document similarity and was also interested in word similarity. I know that doc2vec inherits from word2vec and by default trains using word vectors which we can access.

My question is:

Should we expect these word vectors and by association any of the methods such as most_similar to be 'better' than word2vec or are they essentially going to be the same? If in the future I only wanted word similarity should I just default to word2vec?

Tylerr
  • 146
  • 3

1 Answers1

1

If you only care about word similarity, then apply Occam's Razor and use word2vec. There is no need to increase model complexity if not going to be used.

Also, the quality of embeddings is primarily increased through the size and diversity of the corpus. The algorithm has a much smaller effect on the quality of the embedding.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
  • I care about both, my main question was about if doc2vec word vectors would be better or basically the same than word2vec. Based on another answer on another forum they would basically be the same aside from some random initialization fuzziness. – Tylerr May 21 '21 at 15:18