Index: The Book of Statistical ProofsStatistical ModelsUnivariate normal dataUnivariate Gaussian with known variance ▷ Expectation of cross-validated log Bayes factor

Theorem: Let

\[\label{eq:ugkv} y = \left\lbrace y_1, \ldots, y_n \right\rbrace, \quad y_i \sim \mathcal{N}(\mu, \sigma^2), \quad i = 1, \ldots, n\]

be a univariate Gaussian data set with unknown mean $\mu$ and known variance $\sigma^2$. Moreover, assume two statistical models, one assuming that $\mu$ is zero (null model), the other imposing a normal distribution as the prior distribution on the model parameter $\mu$ (alternative):

\[\label{eq:UGkv-m01} \begin{split} m_0&: \; y_i \sim \mathcal{N}(\mu, \sigma^2), \; \mu = 0 \\ m_1&: \; y_i \sim \mathcal{N}(\mu, \sigma^2), \; \mu \sim \mathcal{N}(\mu_0, \lambda_0^{-1}) \; . \end{split}\]

Then, the expectation of the cross-validated log Bayes factor (cvLBF) in favor of $m_1$ against $m_0$ is

\[\label{eq:UGkv-cvLBF} \left\langle \mathrm{cvLBF}_{10} \right\rangle = \frac{S}{2} \log \left( \frac{S-1}{S} \right) + \frac{1}{2} \left[ \tau n \mu^2 \right]\]

where $\tau = 1/\sigma^2$ is the inverse variance or precision and $S$ is the number of data subsets.

Proof: The cross-validated log Bayes factor for the univariate Gaussian with known variance is

\[\label{eq:UGkv-cvLBF-m10-s1} \mathrm{cvLBF}_{10} = \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{\tau}{2} \sum_{i=1}^S \left( \frac{\left(n_1 \bar{y}_1^{(i)}\right)^2}{n_1} - \frac{(n \bar{y})^2}{n} \right)\]

From \eqref{eq:ugkv}, we know that the data are distributed as $y_i \sim \mathcal{N}(\mu, \sigma^2)$, such that we can derive the expectation of $(n \bar{y})^2$ and $\left(n_1 \bar{y}_1^{(i)}\right)^2$ as follows:

\[\label{eq:UGkv-E(ny2)} \begin{split} \left\langle (n \bar{y})^2 \right\rangle = \left\langle \sum_{i=1}^n \sum_{j=1}^n y_i y_j \right\rangle &= \left\langle n y_i^2 + (n^2-n) [y_i y_j]_{i \neq j} \right\rangle \\ &= n (\mu^2 + \sigma^2) + (n^2 - n) \mu^2 \\ &= n^2 \mu^2 + n \sigma^2 \; . \end{split}\]

Applying this expected value to \eqref{eq:UGkv-cvLBF-m10-s1}, the expected cvLBF emerges as:

\[\label{eq:UGkv-cvLBF-m10-s2} \begin{split} \left\langle \mathrm{cvLBF}_{10} \right\rangle &= \left\langle \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{\tau}{2} \sum_{i=1}^S \left( \frac{\left(n_1 \bar{y}_1^{(i)}\right)^2}{n_1} - \frac{(n \bar{y})^2}{n} \right) \right\rangle \\ &= \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{\tau}{2} \sum_{i=1}^S \left( \frac{\left\langle \left(n_1 \bar{y}_1^{(i)}\right)^2 \right\rangle}{n_1} - \frac{\left\langle (n \bar{y})^2 \right\rangle}{n} \right) \\ &\overset{\eqref{eq:UGkv-E(ny2)}}{=} \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{\tau}{2} \sum_{i=1}^S \left( \frac{n_1^2 \mu^2 + n_1 \sigma^2}{n_1} - \frac{n^2 \mu^2 + n \sigma^2}{n} \right) \\ &= \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{\tau}{2} \sum_{i=1}^S \left( [n_1 \mu^2 + \sigma^2] - [n \mu^2 + \sigma^2] \right) \\ &= \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{\tau}{2} \sum_{i=1}^S (n_1 - n) \mu^2 \end{split}\]

Because it holds that $n_1 + n_2 = n$ and $n_2 = n/S$, we finally have:

\[\label{eq:UGkv-cvLBF-m10-s3} \begin{split} \left\langle \mathrm{cvLBF}_{10} \right\rangle &= \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{\tau}{2} \sum_{i=1}^S (-n_2) \mu^2 \\ &= \frac{S}{2} \log \left( \frac{S-1}{S} \right) + \frac{1}{2} \left[ \tau n \mu^2 \right] \; . \end{split}\]
Sources:

Metadata: ID: P219 | shortcut: ugkv-cvlbfmean | author: JoramSoch | date: 2021-03-24, 12:27.