Proof: Partition of the log model evidence into accuracy and complexity
Theorem: The log model evidence can be partitioned into accuracy and complexity
\[\label{eq:LME} \mathrm{LME}(m) = \mathrm{Acc}(m) - \mathrm{Com}(m)\]where the accuracy term is the posterior expectation of the log-likelihood function
\[\label{eq:Acc} \mathrm{Acc}(m) = \left\langle \log p(y|\theta,m) \right\rangle_{p(\theta|y,m)}\]and the complexity penalty is the Kullback-Leibler divergence of posterior from prior
\[\label{eq:Com} \mathrm{Com}(m) = \mathrm{KL} \left[ p(\theta|y,m) \, || \, p(\theta|m) \right] \; .\]Proof: We consider Bayesian inference on data $y$ using model $m$ with parameters $\theta$. Then, Bayes’ theorem makes a statement about the posterior distribution, i.e. the probability of parameters, given the data and the model:
\[\label{eq:AnC-s1} p(\theta|y,m) = \frac{p(y|\theta,m) \, p(\theta|m)}{p(y|m)} \; .\]Rearranging this for the model evidence, we have:
\[\label{eq:AnC-s2} p(y|m) = \frac{p(y|\theta,m) \, p(\theta|m)}{p(\theta|y,m)} \; .\]Logarthmizing both sides of the equation, we obtain:
\[\label{eq:AnC-s3} \log p(y|m) = \log p(y|\theta,m) - \log \frac{p(\theta|y,m)}{p(\theta|m)} \; .\]Now taking the expectation over the posterior distribution yields:
\[\label{eq:AnC-s4} \log p(y|m) = \int p(\theta|y,m) \log p(y|\theta,m) \, \mathrm{d}\theta - \int p(\theta|y,m) \log \frac{p(\theta|y,m)}{p(\theta|m)} \, \mathrm{d}\theta \; .\]By definition, the left-hand side is the log model evidence and the terms on the right-hand side correspond to the posterior expectation of the log-likelihood function and the Kullback-Leibler divergence of posterior from prior
\[\label{eq:LME-AnC} \mathrm{LME}(m) = \left\langle \log p(y|\theta,m) \right\rangle_{p(\theta|y,m)} - \mathrm{KL} \left[ p(\theta|y,m) \, || \, p(\theta|m) \right]\]which proofs the partition given by \eqref{eq:LME}.
- Beal & Ghahramani (2003): "The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures"; in: Bayesian Statistics, vol. 7; URL: https://mlg.eng.cam.ac.uk/zoubin/papers/valencia02.pdf.
- Penny et al. (2007): "Bayesian Comparison of Spatially Regularised General Linear Models"; in: Human Brain Mapping, vol. 28, pp. 275–293; URL: https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.20327; DOI: 10.1002/hbm.20327.
- Soch et al. (2016): "How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection"; in: NeuroImage, vol. 141, pp. 469–489; URL: https://www.sciencedirect.com/science/article/pii/S1053811916303615; DOI: 10.1016/j.neuroimage.2016.07.047.
Metadata: ID: P3 | shortcut: lme-anc | author: JoramSoch | date: 2019-09-27, 16:13.