8

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on appearing to a topic with pprint(lda_model.print_topics()) I have results for the first topic similar to:

$0.066*\text{car} + 0.032*\text{gas} + 0.031*\text{model} + 0.031*\text{top} + 0.024*\text{CO2} \ + \ ... \ + \ 0.012*\text{investment}$

The results are good as are indicative about the topic, but when I interact with the relevance parameter ($\lambda$ - lambda value) provided by pyLDAvis, I can have results that are more specific about the topic, for example setting $\lambda=0.2$ the top 5 words are:

car, horsepower, torque, speed, V8

My question: is there any function or parameter in gensim that can return the pair probability - word given a specific lambda value?

Ethan
  • 1,625
  • 8
  • 23
  • 39

1 Answers1

0

According this SO post this is the way:

lambd = 0.6 # a specific relevance metric value

all_topics = {}
num_topics = lda_model.num_topics 
num_terms = 10 

for i in range(1,num_topics+1): ## Correct range
    topic = LDAvis_prepared.topic_info[LDAvis_prepared.topic_info.Category == 'Topic'+str(i)].copy()
    topic['relevance'] = topic['loglift']*(1-lambd)+topic['logprob']*lambd
    all_topics['Topic '+str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values
pd.DataFrame(all_topics)

```
scipio1465
  • 11
  • 3