0

I have fuzzy clustering for Topic modelling and got this enter image description here.
There are all total 50 topics[0 to 49] and each topic consists 30 words with a probability multiplicative factor. Now how do I make it as a Classifier input. My final goal to document classification.

Demo

pip install octis
pip install FuzzyTM
from octis.dataset.dataset import Dataset
dataset = Dataset()
dataset.fetch_dataset('DBLP')
data = dataset._Dataset__corpus
print(data[0:5])
pwgt, ptgd = flsaW1.get_matrices()
topics = flsaW1.show_topics()
topics
Amartya
  • 133
  • 5

1 Answers1

0
  1. Prepare an evaluation dataset of atleat 100 documents.
  2. It is important to train with right data. Garbage in means gargage out. Manually verify the result of topic modelling.
  3. Prepare word vectors from documents: Gensim algo is better at context capture than countvector/tfid
  4. Try Navier Bayes or Neural network and use the most promising model. Decision Tree do not work well on Text Classification
amol goel
  • 331
  • 1
  • 6
  • Okay, Amol. Thanks! But I have specifically thought in this way:- Take a document D1 (say), I need to find a vector like this for each topic in range(0,49) [D1 \intersection T1/|T1|, D1\intersection T2/|T2|, D1 \intersection T3/|T3| ........D1 \intersection T49/|T49|]. If I have 50 docs , My matrix will be 50* 50 as number of topic I have chosen is 50. I am particularly not getting how to code this. – Amartya Aug 17 '22 at 15:42
  • What is D1 ? How will you find D1 intersection T1 for new documents ? – amol goel Aug 18 '22 at 02:38
  • You will be always given the set of documents. For a new set of documents, apply the model you have trained. It will work in the same way, as it is doing now! – Amartya Aug 18 '22 at 07:05