0

I'm reading a paper about model uncertainty quantification. Specifically, it says epistemic uncertainty is a kind of uncertainty due to lack of knowledge about a particular region in the input space.

Also, it makes a mathematical characterization of uncertainty. Denoting $x$ and $y$ the input and output points, and $\cal{D}$ the training data, the total uncertainty $\cal{U}$ of a model with respect to the input $x$ can be measured by the predictive entropy:

${\cal U}(x)={\cal H}( P(y|x,{\cal D}))=-\sum_y P(y|x,{\cal D})\log P(y|x,\cal{D})$

Denoting $P(\theta | \cal D)$ the posterior distribution over the model parameters $\theta$, it also decomposes the uncertainty $\cal U$ in other two terms:

${\cal U}(x)=({\cal H}( P(y|x,{\cal D})) -\mathbb{E}_{P(\theta|{\cal D})}[{\cal H}(P(y|x,\theta)]) + \mathbb{E}_{P(\theta|\cal D)}[{{\cal H}(P(y|x,\theta)}]$

My question is, why the first term, as it calls mutual information between $\theta$ and $y$, is capable to express the epistemic uncertainty?

paper reference: https://proceedings.neurips.cc/paper/2021/hash/06fe1c234519f6812fc4c1baae25d6af-Abstract.html

rachmani9
  • 101
  • 3

0 Answers0