Partition of the log model evidence into accuracy and complexity

Index: The Book of Statistical Proofs ▷ Model Selection ▷ Bayesian model selection ▷ Model evidence ▷ Partition into accuracy and complexity

Theorem: The log model evidence can be partitioned into accuracy and complexity

\[\label{eq:LME} \mathrm{LME}(m) = \mathrm{Acc}(m) - \mathrm{Com}(m)\]

where the accuracy term is the posterior expectation of the log-likelihood function

\[\label{eq:Acc} \mathrm{Acc}(m) = \left\langle \log p(y|\theta,m) \right\rangle_{p(\theta|y,m)}\]

and the complexity penalty is the Kullback-Leibler divergence of posterior from prior

\[\label{eq:Com} \mathrm{Com}(m) = \mathrm{KL} \left[ p(\theta|y,m) \, || \, p(\theta|m) \right] \; .\]

Proof: We consider Bayesian inference on data $y$ using model $m$ with parameters $\theta$. Then, Bayes’ theorem makes a statement about the posterior distribution, i.e. the probability of parameters, given the data and the model:

\[\label{eq:AnC-s1} p(\theta|y,m) = \frac{p(y|\theta,m) \, p(\theta|m)}{p(y|m)} \; .\]

Rearranging this for the model evidence, we have:

\[\label{eq:AnC-s2} p(y|m) = \frac{p(y|\theta,m) \, p(\theta|m)}{p(\theta|y,m)} \; .\]

Logarthmizing both sides of the equation, we obtain:

\[\label{eq:AnC-s3} \log p(y|m) = \log p(y|\theta,m) - \log \frac{p(\theta|y,m)}{p(\theta|m)} \; .\]

Now taking the expectation over the posterior distribution yields:

\[\label{eq:AnC-s4} \log p(y|m) = \int p(\theta|y,m) \log p(y|\theta,m) \, \mathrm{d}\theta - \int p(\theta|y,m) \log \frac{p(\theta|y,m)}{p(\theta|m)} \, \mathrm{d}\theta \; .\]

By definition, the left-hand side is the log model evidence and the terms on the right-hand side correspond to the posterior expectation of the log-likelihood function and the Kullback-Leibler divergence of posterior from prior

\[\label{eq:LME-AnC} \mathrm{LME}(m) = \left\langle \log p(y|\theta,m) \right\rangle_{p(\theta|y,m)} - \mathrm{KL} \left[ p(\theta|y,m) \, || \, p(\theta|m) \right]\]

which proofs the partition given by \eqref{eq:LME}.

∎

Sources:

Beal & Ghahramani (2003): "The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures"; in: Bayesian Statistics, vol. 7; URL: https://mlg.eng.cam.ac.uk/zoubin/papers/valencia02.pdf.
Penny et al. (2007): "Bayesian Comparison of Spatially Regularised General Linear Models"; in: Human Brain Mapping, vol. 28, pp. 275–293; URL: https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.20327; DOI: 10.1002/hbm.20327.
Ostwald et al. (2014): "A tutorial on variational Bayes for latent linear stochastic time-series models"; in: Journal of Mathematical Psychology, vol. 60, pp. 1-19; URL: https://www.sciencedirect.com/science/article/abs/pii/S0022249614000352; DOI: 10.1016/j.jmp.2014.04.003.
Soch et al. (2016): "How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection"; in: NeuroImage, vol. 141, pp. 469–489; URL: https://www.sciencedirect.com/science/article/pii/S1053811916303615; DOI: 10.1016/j.neuroimage.2016.07.047.

Metadata: ID: P3 | shortcut: lme-anc | author: JoramSoch | date: 2019-09-27, 16:13.