Index: The Book of Statistical ProofsStatistical Models ▷ Univariate normal data ▷ Bayesian linear regression with known covariance ▷ Log model evidence

Theorem: Let

$\label{eq:GLM} m: y = X \beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \Sigma)$

be a linear regression model with measured $n \times 1$ data vector $y$, known $n \times p$ design matrix $X$ and known $n \times n$ covariance matrix $\Sigma$ as well as unknown $p \times 1$ regression coefficients $\beta$. Moreover, assume a multivariate normal distribution over the model parameter $\beta$:

$\label{eq:GLM-N-prior} p(\beta) = \mathcal{N}(\beta; \mu_0, \Sigma_0) \; .$

Then, the log model evidence for this model is

$\label{eq:GLM-N-LME} \begin{split} \log p(y|m) = &- \frac{1}{2} e_y^\mathrm{T} \Sigma^{-1} e_y - \frac{1}{2} \log |\Sigma| - \frac{n}{2} \log (2 \pi) \\ &- \frac{1}{2} e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta - \frac{1}{2} \log |\Sigma_0| + \frac{1}{2} \log |\Sigma_n| \; . \end{split}$

with the “prediction error” and “parameter error” terms

$\label{eq:GLM-N-err} \begin{split} e_y &= y - X \mu_n \\ e_\beta &= \mu_0 - \mu_n \end{split}$

where the posterior hyperparameters are given by

$\label{eq:GLM-N-post-par} \begin{split} \mu_n &= \Sigma_n (X^\mathrm{T} \Sigma^{-1} y + \Sigma_0^{-1} \mu_0) \\ \Sigma_n &= \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right)^{-1} \; . \end{split}$

Proof: According to the law of marginal probability, the model evidence for this model is:

$\label{eq:GLM-N-ME-s1} p(y|m) = \int p(y|\beta) \, p(\beta) \, \mathrm{d}\beta \; .$

According to the law of conditional probability, the integrand is equivalent to the joint likelihood:

$\label{eq:GLM-N-ME-s2} p(y|m) = \int p(y,\beta) \, \mathrm{d}\beta \; .$

Equation \eqref{eq:GLM} implies the following likelihood function:

$\label{eq:GLM-LF} p(y|\beta) = \mathcal{N}(y; X \beta, \Sigma) = \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \exp\left[ -\frac{1}{2} (y-X\beta)^\mathrm{T} \Sigma^{-1} (y-X\beta) \right] \; .$

When deriving the posterior distribution $p(\beta \vert y)$, the joint likelihood $p(y,\beta)$ is obtained as

$\label{eq:GLM-N-LME-s1} \begin{split} p(y,\beta) = \; & \sqrt{\frac{1}{(2 \pi)^{n+p} |\Sigma| |\Sigma_0|}} \cdot \\ & \exp\left[ -\frac{1}{2} \left( (\beta-\mu_n)^\mathrm{T} \Sigma_n^{-1} (\beta-\mu_n) + (y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n) \right) \right] \; . \end{split}$

Using the probability density function of the multivariate normal distribution, we can rewrite this as

$\label{eq:GLM-N-LME-s2} \begin{split} p(y,\beta) = \; & \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \sqrt{\frac{1}{(2 \pi)^p |\Sigma_0|}} \, \sqrt{\frac{(2 \pi)^p |\Sigma_n|}{1}} \cdot \mathcal{N}(\beta; \mu_n, \Sigma_n) \cdot \\ & \exp\left[ -\frac{1}{2} \left( y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n \right) \right] \; . \end{split}$

With that, $\beta$ can be integrated out easily:

$\label{eq:GLM-N-LME-s3} \int p(y,\beta) \, \mathrm{d}\beta = \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \sqrt{\frac{|\Sigma_n|}{|\Sigma_0|}} \cdot \exp\left[ -\frac{1}{2} \left( y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n \right) \right] \; .$

Now we turn to the intra-exponent term

$\label{eq:GLM-N-LME-s4a} y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n$

and plug in the posterior covariance

$\label{eq:GLM-N-post-par-Sigma} \Sigma_n = \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right)^{-1} \; .$

This gives

$\label{eq:GLM-N-LME-s4b} \begin{split} & \; y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n \\ = & \; y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right) \mu_n \\ = & \; y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} X^\mathrm{T} \Sigma^{-1} X \mu_n - \mu_n^\mathrm{T} \Sigma_0^{-1} \mu_n \\ = & \; (y - X \mu_n)^\mathrm{T} \Sigma^{-1} (y - X \mu_n) + (\mu_0 - \mu_n)^\mathrm{T} \Sigma_0^{-1} (\mu_0 - \mu_n) \\ \overset{\eqref{eq:GLM-N-err}}{=} & \; e_y^\mathrm{T} \Sigma^{-1} e_y + e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta \; . \end{split}$

Thus, the marginal likelihood becomes

$\label{eq:GLM-N-LME-s5} p(y|m) = \int p(y,\beta) \, \mathrm{d}\beta \overset{\eqref{eq:GLM-N-LME-s3}}{=} \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} \, \sqrt{\frac{|\Sigma_n|}{|\Sigma_0|}} \cdot \exp\left[ -\frac{1}{2} \left( e_y^\mathrm{T} \Sigma^{-1} e_y + e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta \right) \right]$

and the log model evidence of this model is given by

$\label{eq:GLM-N-LME-s6} \begin{split} \log p(y|m) = &- \frac{1}{2} e_y^\mathrm{T} \Sigma^{-1} e_y - \frac{1}{2} \log |\Sigma| - \frac{n}{2} \log (2 \pi) \\ &- \frac{1}{2} e_\beta^\mathrm{T} \Sigma_0^{-1} e_\beta - \frac{1}{2} \log |\Sigma_0| + \frac{1}{2} \log |\Sigma_n| \; . \end{split}$
Sources:

Metadata: ID: P434 | shortcut: blrkc-lme | author: JoramSoch | date: 2024-01-19, 08:54.