Proof: The variational free energy is a lower bound on the log model evidence
Theorem: Let $m$ be a generative model with likelihood function $p(y \vert \theta,m) = p(y \vert \theta)$ and prior distribution $p(\theta \vert m) = p(\theta)$. Then, under a variational Bayesian treatment using the approximate posterior distribution $q(\theta \vert m) = q(\theta) \approx p(\theta \vert y)$, the variational free energy is a lower bound on the log model evidence:
\[\label{eq:vb-fe-lme} \mathrm{F}[q(\theta)] \leq \log p(y) = \log \int_{\Theta} p(y,\theta \vert m) \, \mathrm{d}\theta \; .\]Proof: Using a decomposition of the variational free energy, it can be shown that the free energy is equal to the difference between the log model evidence and the Kullback-Leibler divergence of approximate from true posterior distribution:
\[\label{eq:vb-fe-dec} \mathrm{F}[q(\theta)] = \log p(y) - \mathrm{KL}[q(\theta) || p(\theta \vert y)]\]Since the KL divergence is zero or positive for any two distributions
\[\label{eq:kl-nonneg} \mathrm{KL}[P||Q] \geq 0 \; ,\]the free energy must be smaller than or equal to the log model evidence:
\[\label{eq:vb-fe-lme-qed} \mathrm{F}[q(\theta)] \leq \log p(y) \; .\]- Zeidman et al. (2023): "A primer on Variational Laplace (VL)"; in: NeuroImage, vol. 279, art. 120310, pp. 4-5, eqs. 9-11; URL: https://www.sciencedirect.com/science/article/pii/S1053811923004615; DOI: 10.1016/j.neuroimage.2023.120310.
Metadata: ID: P517 | shortcut: fren-lme | author: JoramSoch | date: 2025-09-25, 11:24.