Proof: Log model evidence for Bayesian linear regression with known covariance
Theorem: Let
\[\label{eq:GLM} m: y = X \beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \Sigma)\]be a linear regression model with measured $n \times 1$ data vector $y$, known $n \times p$ design matrix $X$ and known $n \times n$ covariance matrix $\Sigma$ as well as unknown $p \times 1$ regression coefficients $\beta$. Moreover, assume a multivariate normal distribution over the model parameter $\beta$:
\[\label{eq:GLM-N-prior} p(\beta) = \mathcal{N}(\beta; \mu_0, \Sigma_0) \; .\]Then, the log model evidence for this model is
\[\label{eq:GLM-N-LME} \begin{split} \log p(y|m) = &- \frac{1}{2} e_y^\mathrm{T} \Sigma^{-1} e_y - \frac{1}{2} \log |\Sigma| - \frac{n}{2} \log (2 \pi) \\ &- \frac{1}{2} e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta - \frac{1}{2} \log |\Sigma_0| + \frac{1}{2} \log |\Sigma_n| \; . \end{split}\]with the “prediction error” and “parameter error” terms
\[\label{eq:GLM-N-err} \begin{split} e_y &= y - X \mu_n \\ e_\beta &= \mu_0 - \mu_n \end{split}\]where the posterior hyperparameters are given by
\[\label{eq:GLM-N-post-par} \begin{split} \mu_n &= \Sigma_n (X^\mathrm{T} \Sigma^{-1} y + \Sigma_0^{-1} \mu_0) \\ \Sigma_n &= \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right)^{-1} \; . \end{split}\]Proof: According to the law of marginal probability, the model evidence for this model is:
\[\label{eq:GLM-N-ME-s1} p(y|m) = \int p(y|\beta) \, p(\beta) \, \mathrm{d}\beta \; .\]According to the law of conditional probability, the integrand is equivalent to the joint likelihood:
\[\label{eq:GLM-N-ME-s2} p(y|m) = \int p(y,\beta) \, \mathrm{d}\beta \; .\]Equation \eqref{eq:GLM} implies the following likelihood function:
\[\label{eq:GLM-LF} p(y|\beta) = \mathcal{N}(y; X \beta, \Sigma) = \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \exp\left[ -\frac{1}{2} (y-X\beta)^\mathrm{T} \Sigma^{-1} (y-X\beta) \right] \; .\]When deriving the posterior distribution $p(\beta \vert y)$, the joint likelihood $p(y,\beta)$ is obtained as
\[\label{eq:GLM-N-LME-s1} \begin{split} p(y,\beta) = \; & \sqrt{\frac{1}{(2 \pi)^{n+p} |\Sigma| |\Sigma_0|}} \cdot \\ & \exp\left[ -\frac{1}{2} \left( (\beta-\mu_n)^\mathrm{T} \Sigma_n^{-1} (\beta-\mu_n) + (y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n) \right) \right] \; . \end{split}\]Using the probability density function of the multivariate normal distribution, we can rewrite this as
\[\label{eq:GLM-N-LME-s2} \begin{split} p(y,\beta) = \; & \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \sqrt{\frac{1}{(2 \pi)^p |\Sigma_0|}} \, \sqrt{\frac{(2 \pi)^p |\Sigma_n|}{1}} \cdot \mathcal{N}(\beta; \mu_n, \Sigma_n) \cdot \\ & \exp\left[ -\frac{1}{2} \left( y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n \right) \right] \; . \end{split}\]With that, $\beta$ can be integrated out easily:
\[\label{eq:GLM-N-LME-s3} \int p(y,\beta) \, \mathrm{d}\beta = \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \sqrt{\frac{|\Sigma_n|}{|\Sigma_0|}} \cdot \exp\left[ -\frac{1}{2} \left( y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n \right) \right] \; .\]Now we turn to the intra-exponent term
\[\label{eq:GLM-N-LME-s4a} y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n\]and plug in the posterior covariance
\[\label{eq:GLM-N-post-par-Sigma} \Sigma_n = \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right)^{-1} \; .\]This gives
\[\label{eq:GLM-N-LME-s4b} \begin{split} & \; y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n \\ = & \; y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right) \mu_n \\ = & \; y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} X^\mathrm{T} \Sigma^{-1} X \mu_n - \mu_n^\mathrm{T} \Sigma_0^{-1} \mu_n \\ = & \; (y - X \mu_n)^\mathrm{T} \Sigma^{-1} (y - X \mu_n) + (\mu_0 - \mu_n)^\mathrm{T} \Sigma_0^{-1} (\mu_0 - \mu_n) \\ \overset{\eqref{eq:GLM-N-err}}{=} & \; e_y^\mathrm{T} \Sigma^{-1} e_y + e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta \; . \end{split}\]Thus, the marginal likelihood becomes
\[\label{eq:GLM-N-LME-s5} p(y|m) = \int p(y,\beta) \, \mathrm{d}\beta \overset{\eqref{eq:GLM-N-LME-s3}}{=} \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \sqrt{\frac{|\Sigma_n|}{|\Sigma_0|}} \cdot \exp\left[ -\frac{1}{2} \left( e_y^\mathrm{T} \Sigma^{-1} e_y + e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta \right) \right]\]and the log model evidence of this model is given by
\[\label{eq:GLM-N-LME-s6} \begin{split} \log p(y|m) = &- \frac{1}{2} e_y^\mathrm{T} \Sigma^{-1} e_y - \frac{1}{2} \log |\Sigma| - \frac{n}{2} \log (2 \pi) \\ &- \frac{1}{2} e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta - \frac{1}{2} \log |\Sigma_0| + \frac{1}{2} \log |\Sigma_n| \; . \end{split}\]- Penny WD (2012): "Comparing Dynamic Causal Models using AIC, BIC and Free Energy"; in: NeuroImage, vol. 59, iss. 2, pp. 319-330, eqs. 19-23; URL: https://www.sciencedirect.com/science/article/pii/S1053811911008160; DOI: 10.1016/j.neuroimage.2011.07.039.
- Bishop CM (2006): "Bayesian linear regression"; in: Pattern Recognition for Machine Learning, pp. 152-161; URL: https://www.springer.com/gp/book/9780387310732.
Metadata: ID: P434 | shortcut: blrkc-lme | author: JoramSoch | date: 2024-01-19, 08:54.