I am trying to understand Perplexity within Natural Language Processing as a metric more fully. And I am doing so by creating manual examples to understand all the component parts. Is the following correctly understood:
Given a lists W of words (as probabilities), where W consists of $w_1$ .. $w_n$ , and where we know the probabilities for each word, a model will still have to compute the intersection of words for the following formula to be useful:
$$ P(W)=P\left(w_1\right) P\left(w_2 \mid w_1\right) P\left(w_3 \mid w_2, w_1\right) \ldots P\left(w_N \mid w_{N-1}, w_{N-2}\right) $$
Since the formula for conditional probability is given by:
$P(A \mid B)=\frac{P(A \cap B)}{P(B)}$
And the intersection ${P(A \cap B)}$ will in the instance of NLP be calculated by Cross Entropy loss in a model.