F-test for multiple linear regression using contrast-based inference

Index: The Book of Statistical Proofs ▷ Statistical Models ▷ Univariate normal data ▷ Multiple linear regression ▷ Contrast-based F-test

Theorem: Consider a linear regression model

\[\label{eq:mlr} y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V)\]

and an F-contrast on the model parameters

\[\label{eq:fcon} \gamma = C^\mathrm{T} \beta \quad \text{where} \quad C \in \mathbb{R}^{p \times q} \; .\]

Then, the test statistic

\[\label{eq:mlr-f} F = \hat{\beta}^\mathrm{T} C \left( \hat{\sigma}^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} C^\mathrm{T} \hat{\beta} / q\]

with the parameter estimates

\[\label{eq:mlr-est} \begin{split} \hat{\beta} &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y \\ \hat{\sigma}^2 &= \frac{1}{n-p} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta}) \end{split}\]

follows an F-distribution

\[\label{eq:mlr-f-dist} F \sim \mathrm{F}(q, n-p)\]

under the null hypothesis

\[\label{eq:mlr-f-h0} \begin{split} H_0: &\; \gamma_1 = 0 \wedge \ldots \wedge \gamma_q = 0 \\ H_1: &\; \gamma_1 \neq 0 \vee \ldots \vee \gamma_q \neq 0 \; . \end{split}\]

Proof:

1) We know that the estimated regression coefficients in linear regression follow a multivariate normal distribution:

\[\label{eq:b-est-dist} \hat{\beta} \sim \mathcal{N}\left( \beta, \, \sigma^2 (X^\mathrm{T} V^{-1} X)^{-1} \right) \; .\]

Thus, the estimated contrast vector $\hat{\gamma} = C^\mathrm{T} \hat{\beta}$ is also distributed according to a multivariate normal distribution:

\[\label{eq:g-est-dist-cond} \hat{\gamma} \sim \mathcal{N}\left( C^\mathrm{T} \beta, \, \sigma^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right) \; .\]

Substituting the noise variance $\sigma^2$ with the noise precision $\tau = 1/\sigma^2$, we can also write this down as a conditional distribution:

\[\label{eq:g-est-tau-dist-cond} \hat{\gamma} \vert \tau \sim \mathcal{N}\left( C^\mathrm{T} \beta, (\tau Q)^{-1} \right) \quad \text{with} \quad Q = \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \; .\]

2) We also know that the residual sum of squares, divided the true error variance

\[\label{eq:mlr-rss} \frac{1}{\sigma^2} \sum_{i=1}^{n} \hat{\varepsilon}_i^2 = \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \frac{1}{\sigma^2} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta})\]

is following a chi-squared distribution:

\[\label{eq:mlr-rss-dist} \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \tau \, \hat{\varepsilon}^\mathrm{T} \hat{\varepsilon} \sim \chi^2(n-p) \; .\]

The chi-squared distribution is a special case of the gamma distribution

\[\label{eq:chi2-gam} X \sim \chi^2(k) \quad \Rightarrow \quad X \sim \mathrm{Gam}\left( \frac{k}{2}, \frac{1}{2} \right)\]

and the gamma distribution changes under multiplication in the following way:

\[\label{eq:gam-scal} X \sim \mathrm{Gam}\left( a, b \right) \quad \Rightarrow \quad cX \sim \mathrm{Gam}\left( a, \frac{b}{c} \right) \; .\]

Thus, combining \eqref{eq:chi2-gam} and \eqref{eq:gam-scal} with \eqref{eq:mlr-rss-dist}, we obtain the marginal distribution of $\tau$ as:

\[\label{eq:tau-dist} \frac{1}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} \left( \tau \, \hat{\varepsilon}^\mathrm{T} \hat{\varepsilon} \right) = \tau \sim \mathrm{Gam}\left( \frac{n-p}{2}, \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{2} \right) \; .\]

3) Note that the joint distribution of $\hat{\gamma}$ and $\tau$ is, following from \eqref{eq:g-est-tau-dist-cond} and \eqref{eq:tau-dist} and by definition, a normal-gamma distribution:

\[\label{eq:g-est-tau-dist-joint} \hat{\gamma}, \tau \sim \mathrm{NG}\left( C^\mathrm{T} \beta, Q, \frac{n-p}{2}, \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{2} \right) \; .\]

The marginal distribution of a normal-gamma distribution with respect to the normal random variable, is a multivariate t-distribution:

\[\label{eq:ng-mvt} X, Y \sim \mathrm{NG}(\mu, \Lambda, a, b) \quad \Rightarrow \quad X \sim \mathrm{t}\left( \mu, \left( \frac{a}{b} \Lambda\right)^{-1}, 2a \right) \; .\]

Thus, the marginal distribution of $\hat{\gamma}$ is:

\[\label{eq:g-est-dist-marg} \hat{\gamma} \sim \mathrm{t}\left( C^\mathrm{T} \beta, \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right)^{-1}, n-p \right) \; .\]

4) Because of the following relationship between the multivariate t-distribution and the F-distribution

\[\label{eq:mvt-f} X \sim t(\mu, \Sigma, \nu) \quad \Rightarrow \quad (X-\mu)^\mathrm{T} \, \Sigma^{-1} (X-\mu)/n \sim F(n, \nu) \; ,\]

the following quantity is, by definition, F-distributed

\[\label{eq:mlr-f-s1} F = \left( \hat{\gamma} - C^\mathrm{T} \beta \right)^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \left( \hat{\gamma} - C^\mathrm{T} \beta \right) / q \sim \mathrm{F}(q, n-p)\]

and under the null hypothesis \eqref{eq:mlr-f-h0}, it can be evaluated as:

\[\label{eq:mlr-f-s2} \begin{split} F &\overset{\eqref{eq:mlr-f-s1}}{=} \left( \hat{\gamma} - C^\mathrm{T} \beta \right)^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \left( \hat{\gamma} - C^\mathrm{T} \beta \right) / q \\ &\overset{\eqref{eq:mlr-f-h0}}{=} \hat{\gamma}^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \hat{\gamma} / q \\ &\overset{\eqref{eq:fcon}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) C^\mathrm{T} \hat{\beta} / q \\ &\overset{\eqref{eq:g-est-tau-dist-cond}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\ &\overset{\eqref{eq:mlr-rss}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{(y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta})} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\ &\overset{\eqref{eq:mlr-est}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{1}{\hat{\sigma}^2} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\ &= \hat{\beta}^\mathrm{T} C \left( \hat{\sigma}^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} C^\mathrm{T} \hat{\beta} / q \; . \end{split}\]

This means that the null hypothesis in \eqref{eq:mlr-f-h0} can be rejected when $F$ from \eqref{eq:mlr-f-s2} is as extreme or more extreme than the critical value obtained from Fisher’s F-distribution with $q$ numerator and $n-p$ denominator degrees of freedom using a significance level $\alpha$.

∎

Sources:

Stephan, Klaas Enno (2010): "Classical (frequentist) inference"; in: Methods and models for fMRI data analysis in neuroeconomics, Lecture 4, Slides 23/25; URL: http://www.socialbehavior.uzh.ch/teaching/methodsspring10.html.
Koch, Karl-Rudolf (2007): "Multivariate Distributions"; in: Introduction to Bayesian Statistics, Springer, Berlin/Heidelberg, 2007, ch. 2.5, eqs. 2.202, 2.213, 2.211; URL: https://www.springer.com/de/book/9783540727231; DOI: 10.1007/978-3-540-72726-2.
jld (2018): "Understanding t-test for linear regression"; in: StackExchange CrossValidated, retrieved on 2022-12-13; URL: https://stats.stackexchange.com/a/344008.
Penny, William (2006): "Comparing nested GLMs"; in: Mathematics for Brain Imaging, ch. 2.3, pp. 51-52, eq. 2.9; URL: https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf.

Metadata: ID: P392 | shortcut: mlr-f | author: JoramSoch | date: 2022-12-13, 12:36.