Index: The Book of Statistical ProofsStatistical ModelsUnivariate normal dataUnivariate Gaussian ▷ Accuracy and complexity

Theorem: Let

\[\label{eq:ug} m: \; y = \left\lbrace y_1, \ldots, y_n \right\rbrace, \quad y_i \sim \mathcal{N}(\mu, \sigma^2), \quad i = 1, \ldots, n\]

be a univariate Gaussian data set with unknown mean $\mu$ and unknown variance $\sigma^2$. Moreover, assume a normal-gamma prior distribution over the model parameters $\mu$ and $\tau = 1/\sigma^2$:

\[\label{eq:UG-NG-prior} p(\mu,\tau) = \mathcal{N}(\mu; \mu_0, (\tau \lambda_0)^{-1}) \cdot \mathrm{Gam}(\tau; a_0, b_0) \; .\]

Then, accuracy and complexity of this model are

\[\label{eq:UG-NG-AnC} \begin{split} \mathrm{Acc}(m) &= - \frac{1}{2} \frac{a_n}{b_n} \left( y^\mathrm{T} y - 2 n \bar{y} \mu_n + n \mu_n^2 \right) - \frac{1}{2} n \lambda_n^{-1} + \frac{n}{2} \left(\psi(a_n) - \log(b_n)\right) - \frac{n}{2} \log (2 \pi) \\ \mathrm{Com}(m) &= \frac{1}{2} \frac{a_n}{b_n} \left[ \lambda_0 (\mu_0 - \mu_n)^2 - 2 (b_n - b_0) \right] + \frac{1}{2} \frac{\lambda_0}{\lambda_n} - \frac{1}{2} \log \frac{\lambda_0}{\lambda_n} - \frac{1}{2} \\ &+ a_0 \cdot \log \frac{b_n}{b_0} - \log \frac{\Gamma(a_n)}{\Gamma(a_0)} + (a_n - a_0) \cdot \psi(a_n) \end{split}\]

where $\mu_n$ and $\lambda_n$ as well as $a_n$ and $b_n$ are the posterior hyperparameters for the univariate Gaussian and $\bar{y}$ is the sample mean.

Proof: Model accuracy and complexity are defined as

\[\label{eq:lme-anc} \begin{split} \mathrm{LME}(m) &= \mathrm{Acc}(m) - \mathrm{Com}(m) \\ \mathrm{Acc}(m) &= \left\langle \log p(y|\mu,\tau,m) \right\rangle_{p(\mu,\tau|y,m)} \\ \mathrm{Com}(m) &= \mathrm{KL} \left[ p(\mu,\tau|y,m) \, || \, p(\mu,\tau|m) \right] \; . \end{split}\]


The accuracy term is the expectation of the log-likelihood function $\log p(y|\mu,\tau)$ with respect to the posterior distribution $p(\mu,\tau|y)$. With the log-likelihood function for the univariate Gaussian and the posterior distribution for the univariate Gaussian, the model accuracy of $m$ evaluates to:

\[\label{eq:UG-NG-Acc} \begin{split} \mathrm{Acc}(m) &= \left\langle \log p(y|\mu,\tau) \right\rangle_{p(\mu,\tau|y)} \\ &= \left\langle \left\langle \log p(y|\mu,\tau) \right\rangle_{p(\mu|\tau,y)} \right\rangle_{p(\tau|y)} \\ &= \left\langle \left\langle \frac{n}{2} \log (\tau) - \frac{n}{2} \log (2 \pi) - \frac{\tau}{2} \left( y^\mathrm{T} y - 2 n \bar{y} \mu + n \mu^2 \right) \right\rangle_{\mathcal{N}(\mu_n, (\tau \lambda_n)^{-1})} \right\rangle_{\mathrm{Gam}(a_n, b_n)} \\ &= \left\langle \frac{n}{2} \log (\tau) - \frac{n}{2} \log (2 \pi) - \frac{\tau}{2} \left( y^\mathrm{T} y - 2 n \bar{y} \mu_n + n \mu_n^2 \right) - \frac{1}{2} n \lambda_n^{-1} \right\rangle_{\mathrm{Gam}(a_n, b_n)} \\ &= \frac{n}{2} \left(\psi(a_n) - \log(b_n)\right) - \frac{n}{2} \log (2 \pi) - \frac{1}{2} \frac{a_n}{b_n} \left( y^\mathrm{T} y - 2 n \bar{y} \mu_n + n \mu_n^2 \right) - \frac{1}{2} n \lambda_n^{-1} \\ &= - \frac{1}{2} \frac{a_n}{b_n} \left( y^\mathrm{T} y - 2 n \bar{y} \mu_n + n \mu_n^2 \right) - \frac{1}{2} n \lambda_n^{-1} + \frac{n}{2} \left(\psi(a_n) - \log(b_n)\right) - \frac{n}{2} \log (2 \pi) \\ \end{split}\]


The complexity penalty is the Kullback-Leibler divergence of the posterior distribution $p(\mu,\tau|y)$ from the prior distribution $p(\mu,\tau)$. With the prior distribution given by \eqref{eq:UG-NG-prior}, the posterior distribution for the univariate Gaussian and the Kullback-Leibler divergence of the normal-gamma distribution, the model complexity of $m$ evaluates to:

\[\label{eq:UG-NG-Com} \begin{split} \mathrm{Com}(m) &= \mathrm{KL} \left[ p(\mu,\tau|y) \, || \, p(\mu,\tau) \right] \\ &= \mathrm{KL} \left[ \mathrm{NG}(\mu_n, \lambda_n^{-1}, a_n, b_n) \, || \, \mathrm{NG}(\mu_0, \lambda_0^{-1}, a_0, b_0) \right] \\ &= \frac{1}{2} \frac{a_n}{b_n} \left[ \lambda_0 (\mu_0 - \mu_n)^2 \right] + \frac{1}{2} \frac{\lambda_0}{\lambda_n} - \frac{1}{2} \log \frac{\lambda_0}{\lambda_n} - \frac{1}{2} \\ &+ a_0 \cdot \log \frac{b_n}{b_0} - \log \frac{\Gamma(a_n)}{\Gamma(a_0)} + (a_n - a_0) \cdot \psi(a_n) - (b_n - b_0) \cdot \frac{a_n}{b_n} \\ &= \frac{1}{2} \frac{a_n}{b_n} \left[ \lambda_0 (\mu_0 - \mu_n)^2 - 2 (b_n - b_0) \right] + \frac{1}{2} \frac{\lambda_0}{\lambda_n} - \frac{1}{2} \log \frac{\lambda_0}{\lambda_n} - \frac{1}{2} \\ &+ a_0 \cdot \log \frac{b_n}{b_0} - \log \frac{\Gamma(a_n)}{\Gamma(a_0)} + (a_n - a_0) \cdot \psi(a_n) \; . \end{split}\]

A control calculation confirms that

\[\label{eq:UG-NG-AnC-LME} \mathrm{Acc}(m) - \mathrm{Com}(m) = \mathrm{LME}(m)\]

where $\mathrm{LME}(m)$ is the log model evidence for the univariate Gaussian.

Sources:

Metadata: ID: P240 | shortcut: ug-anc | author: JoramSoch | date: 2021-07-14, 08:26.