Gensim LDA model: return keywords based on relevance (λ - lambda) value

Question

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on appearing to a topic with pprint(lda_model.print_topics()) I have results for the first topic similar to:

$0.066*\text{car} + 0.032*\text{gas} + 0.031*\text{model} + 0.031*\text{top} + 0.024*\text{CO2} \ + \ ... \ + \ 0.012*\text{investment}$

The results are good as are indicative about the topic, but when I interact with the relevance parameter ($\lambda$ - lambda value) provided by pyLDAvis, I can have results that are more specific about the topic, for example setting $\lambda=0.2$ the top 5 words are:

car, horsepower, torque, speed, V8

My question: is there any function or parameter in gensim that can return the pair probability - word given a specific lambda value?

score 0 · Answer 1 · answered Feb 06 '23 at 17:32

According this SO post this is the way:

lambd = 0.6 # a specific relevance metric value

all_topics = {}
num_topics = lda_model.num_topics 
num_terms = 10 

for i in range(1,num_topics+1): ## Correct range
    topic = LDAvis_prepared.topic_info[LDAvis_prepared.topic_info.Category == 'Topic'+str(i)].copy()
    topic['relevance'] = topic['loglift']*(1-lambd)+topic['logprob']*lambd
    all_topics['Topic '+str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values
pd.DataFrame(all_topics)

```

Gensim LDA model: return keywords based on relevance (λ - lambda) value

1 Answers1