doc2vec - paragraph or article as document

Asked Jan 09 '21 at 13:46

Active Jan 09 '21 at 16:24

Viewed 60 times

I'm trying to train a doc2vec model on the German wiki corpus. While looking for the best practice I've found different possibilities on how to create the training data.

Should I split every Wikipedia article by each natural paragraph into several documents or use one article as a document to train my model?

EDIT: Is there an estimate on how many words per document for doc2vec?

edited Jan 09 '21 at 16:24

asked Jan 09 '21 at 13:46

jonas

doc2vec - paragraph or article as document

0 Answers0