How does the value of the cross-entropy loss function vary with the number of classes being predicted?
Formally, if the loss function is $$ L = - \sum_{x \in X} P^*(x) \log P(x) $$ where $P^*(\cdot)$ is the true distribution, $P(\cdot)$ is the predicted distribution, and $X$ is the distribution's support, I want to know how $L$ changes for sets $X$ of different sizes. Intuitively, it doesn't seem like we should be able to come up with an analytical formula for this, or at least not a very useful one, since the change would depend on both $P^*(\cdot)$ and of $P(\cdot)$, but I wonder if there are any heuristics out there.