1

I'm trying to train a doc2vec model on the German wiki corpus. While looking for the best practice I've found different possibilities on how to create the training data.

Should I split every Wikipedia article by each natural paragraph into several documents or use one article as a document to train my model?

EDIT: Is there an estimate on how many words per document for doc2vec?

jonas
  • 143
  • 4

0 Answers0