Proof: Statistical significance test for the coefficient of determinantion based on an omnibus F-test
Theorem: Consider a linear regression model with known design matrix $X$, known covariance structure $V$, unknown regression parameters $\beta$ and unknown noise variance $\sigma^2$:
\[\label{eq:mlr} y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V) \; .\]Further assume that $X$ contains a constant regressor. Then, the coefficient of determination can be used to calculate a test statistic
\[\label{eq:f-rsq} F = \frac{R^2/(p-1)}{(1-R^2)/(n-p)}\]where $n$ and $p$ are the dimensions of the design matrix $X$, and this test statistic follows an F-distribution
\[\label{eq:f-rsq-dist} F \sim \mathrm{F}(p-1, n-p)\]under the null hypothesis that the true coefficient of determination is zero
\[\label{eq:rsq-test-h0} H_0: \; R^2 = 0 \; .\]Proof: Consider two linear regression models for the same measured data $y$, with design matrices $X = X_0 \in \mathbb{R}^{n \times p_0}$ and $X = \left[ X_0, X_1 \right] \in \mathbb{R}^{n \times p}$ as well as regression coefficients $\beta = \beta_0 \in \mathbb{R}^{p_0 \times 1}$ and $\beta = \left[ \beta_0^\mathrm{T}, \beta_1^\mathrm{T} \right]^\mathrm{T} \in \mathbb{R}^{p \times 1}$.
Then, under the null hypothesis that all regression coefficients $\beta_1$ associated with $X_1$ are zero
\[\label{eq:mlr-fomnibus-h0} H_0: \; \beta_1 = 0_{p-p_0} \quad \Leftrightarrow \quad \beta_i = 0 \quad \text{for all} \quad j = p_0+1,\ldots,p \; ,\]the omnibus F-statistic follows an F-distribution
\[\label{eq:mlr-fomnibus} F = \frac{(\mathrm{RSS}_0-\mathrm{RSS})/(p-p_0)}{\mathrm{RSS}/(n-p)} \sim \mathrm{F}(p-p_0, n-p)\]where $\mathrm{RSS}_0$ and $\mathrm{RSS}$ are the residual sums of squares of the null model with $X_0$ and the full model with $X_0$ nested in $X$, after regression coefficients have been estimated with weighted least squares or maximum likelihood.
Since by the requirements of our theorem, $X$ contains a constant regressor, we can assume the following design matrices without loss of generality:
Thus, since a single constant regressor estimates the mean and considering the definition of the total sum of squares $\mathrm{TSS}$, we in our case have:
\[\label{eq:rss0-p0} \mathrm{RSS}_0 = \mathrm{TSS} \quad \text{and} \quad p_0 = 1 \; .\]
The coefficient of determination is given by
which can also be written as
\[\label{eq:rsq-ess} R^2 = \frac{\mathrm{ESS}}{\mathrm{TSS}} \; .\]If all regression coefficients $\beta_1$ associated with $X_1$ are zero, then the true $R^2$ is zero, because there is no variance explained beyond the constant regressor, the explained sum of squares $\mathrm{ESS}$ is zero and the residual sum of squares $\mathrm{RSS}$ is equal to the total sum of squares $\mathrm{TSS}$.
Then, by virtue of \eqref{eq:mlr-fomnibus}, we get the following F-statistic:
This means that the null hypothesis can be rejected when $F$ as a function of $R^2$ is as extreme or more extreme than the critical value obtained from the F-distribution with $p-1$ denominator and $n-p$ numerator degrees of freedom using a significance level $\alpha$.
- Alecos Papadopoulos (2014): "What is the distribution of R² in linear regression under the null hypothesis?"; in: StackExchange CrossValidated, retrieved on 2024-03-15; URL: https://stats.stackexchange.com/a/130082.
Metadata: ID: P441 | shortcut: rsq-test | author: JoramSoch | date: 2024-03-08, 12:03.