Proof: Cross-validated log Bayes factor for the univariate Gaussian
Theorem: Let
\[\label{eq:UG} y = \left\lbrace y_1, \ldots, y_n \right\rbrace, \quad y_i \sim \mathcal{N}(\mu, \sigma^2), \quad i = 1, \ldots, n\]be a univariate Gaussian data set with unknown mean $\mu$ and unknown variance $\sigma^2$. Moreover, assume two statistical models, one assuming that $\mu$ is zero (null model), the other imposing a normal distribution as the prior distribution on the mean parameter $\mu$ (alternative) and both imposing a gamma distribtion on the precision parameter $\tau = 1/\sigma^2$:
\[\label{eq:UG-m01} \begin{split} m_0 &: \; y_i \sim \mathcal{N}(\mu, \tau^{-1}), \; \mu = 0, \; \tau \sim \mathrm{Gam}(a_0, b_0) \\ m_1 &: \; y_i \sim \mathcal{N}(\mu, \tau^{-1}), \; \mu|\tau \sim \mathcal{N}(\mu_0, (\tau \lambda_0)^{-1}), \; \tau \sim \mathrm{Gam}(a_0, b_0) \end{split}\]Then, the cross-validated log Bayes factor in favor of $m_1$ against $m_0$ is
\[\label{eq:UG-cvLBF} \mathrm{cvLBF}_{10} = \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{S \cdot n}{2} \left[ \log \left( 1 - \frac{n {\bar{y}}^2}{y^\mathrm{T} y} \right) \right] + \frac{n_1}{2} \sum_{i=1}^S \left[ \log \left( 1 - \frac{ n_1 \bar{y}_1^{(i)} }{ {y_1^{(i)}}^\mathrm{T} y_1^{(i)} } \right) \right]\]where $\bar{y}$ is the sample mean, $y_1^{(i)}$ are the training data in the $i$-th cross-validation fold with $n_1$ data points and $S$ is the number of data subsets.
Proof: The relationship between log Bayes factor and log model evidences also holds for cross-validated log bayes factor (cvLBF) and cross-validated log model evidences (cvLME):
\[\label{eq:cvLBF-cvLME} \mathrm{cvLBF}_{12} = \mathrm{cvLME}(m_1) - \mathrm{cvLME}(m_2) \; .\]The cross-validated log model evidences of $m_0$ and $m_1$ are given by
\[\label{eq:UG-cvLME-m01} \begin{split} \mathrm{cvLME}(m_0) = & - \frac{n}{2} \log (2 \pi) + S \cdot \log \Gamma \left( \frac{n}{2} \right) - S \cdot \log \Gamma \left( \frac{1}{2} \frac{S-1}{S} n \right) \\ &- \frac{S \cdot n}{2} \log \left[ \frac{1}{2} \left( y^\mathrm{T} y \right) \right] + \frac{n_1}{2} \sum_{i=1}^S \log \left[ \frac{1}{2} \left( {y_1^{(i)}}^\mathrm{T} y_1^{(i)} \right) \right] \\ \mathrm{cvLME}(m_1) = & - \frac{n}{2} \log (2 \pi) + \frac{S}{2} \log \left( \frac{S-1}{S} \right) + S \cdot \log \Gamma \left( \frac{n}{2} \right) - S \cdot \log \Gamma \left( \frac{1}{2} \frac{S-1}{S} n \right) \\ &- \frac{S \cdot n}{2} \log \left[ \frac{1}{2} \left( y^\mathrm{T} y - n {\bar{y}}^2 \right) \right] + \frac{n_1}{2} \sum_{i=1}^S \log \left[ \frac{1}{2} \left( {y_1^{(i)}}^\mathrm{T} y_1^{(i)} - n_1 \bar{y}_1^{(i)} \right) \right] \; . \end{split}\]Subtracting the two cvLMEs from each other, the cvLBF emerges as
\[\label{eq:UG-cvLBF-qed} \begin{split} \mathrm{cvLBF}_{10} = &\; \mathrm{cvLME}(m_1) - \mathrm{LME}(m_0) \\ = &+ \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{S \cdot n}{2} \left[ \log \left( \frac{1}{2} \left( y^\mathrm{T} y - n {\bar{y}}^2 \right) \right) - \log \left( \frac{1}{2} \left( y^\mathrm{T} y \right) \right) \right] \\ &+ \frac{n_1}{2} \sum_{i=1}^S \left[ \log \left( \frac{1}{2} \left( {y_1^{(i)}}^\mathrm{T} y_1^{(i)} - n_1 \bar{y}_1^{(i)} \right) \right) - \log \left( \frac{1}{2} \left( {y_1^{(i)}}^\mathrm{T} y_1^{(i)} \right) \right) \right] \\ = &\; \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{S \cdot n}{2} \left[ \log \left( \frac{y^\mathrm{T} y - n {\bar{y}}^2}{y^\mathrm{T} y} \right) \right] + \frac{n_1}{2} \sum_{i=1}^S \left[ \log \left( \frac{ {y_1^{(i)}}^\mathrm{T} y_1^{(i)} - n_1 \bar{y}_1^{(i)} }{ {y_1^{(i)}}^\mathrm{T} y_1^{(i)} } \right) \right] \\ = &\; \frac{S}{2} \log \left( \frac{S-1}{S} \right) - \frac{S \cdot n}{2} \left[ \log \left( 1 - \frac{n {\bar{y}}^2}{y^\mathrm{T} y} \right) \right] + \frac{n_1}{2} \sum_{i=1}^S \left[ \log \left( 1 - \frac{ n_1 \bar{y}_1^{(i)} }{ {y_1^{(i)}}^\mathrm{T} y_1^{(i)} } \right) \right] \; . \end{split}\]Metadata: ID: P491 | shortcut: ug-cvlbf | author: JoramSoch | date: 2025-03-07, 01:14.