Distribution of residual sum of squares in multiple linear regression with weighted least squares

Index: The Book of Statistical Proofs ▷ Statistical Models ▷ Univariate normal data ▷ Multiple linear regression ▷ Distribution of residual sum of squares

Theorem: Assume a linear regression model with correlated observations

\[\label{eq:mlr} y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V)\]

and consider estimation using weighted least squares. Then, the residual sum of squares $\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}$, divided by the true error variance $\sigma^2$, follows a chi-squared distribution with $n-p$ degrees of freedom

\[\label{eq:mlr-rss-dist} \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} \sim \chi^2(n-p)\]

where $n$ and $p$ are the dimensions of the $n \times p$ design matrix $X$.

Proof: Consider an $n \times n$ square matrix $W$, such that

\[\label{eq:W-def} W V W^\mathrm{T} = I_n \; .\]

Then, left-multiplying the regression model in \eqref{eq:mlr} with $W$ gives

\[\label{eq:mlr-W-s1} Wy = WX\beta + W\varepsilon, \; W\varepsilon \sim \mathcal{N}(0, \sigma^2 W V W^\mathrm{T})\]

which can be rewritten as

\[\label{eq:mlr-W-s2} \tilde{y} = \tilde{X}\beta + \tilde{\varepsilon}, \; \tilde{\varepsilon} \sim \mathcal{N}(0, \sigma^2 I_n)\]

where $\tilde{y} = Wy$, $\tilde{X} = WX$ and $\tilde{\varepsilon} = W\varepsilon$. This implies the distribution

\[\label{eq:y-tilde-dist} \tilde{y} \sim \mathcal{N}(\tilde{X} \beta, \sigma^2 I_n) \; .\]

With that, we have obtained a linear regression model with independent observations. Cochran’s theorem for multivariate normal variables states that, for an $n \times 1$ normal random vector whose covariance matrix is a scalar multiple of the identity matrix, a specific squared form will follow a non-central chi-squared distribution where the degrees of freedom and the non-centrality paramter depend on the matrix in the quadratic form:

\[\label{eq:mvn-cochran} x \sim \mathcal{N}(\mu, \sigma^2 I_n) \quad \Rightarrow \quad y = x^\mathrm{T} A x /\sigma^2 \sim \chi^2\left( \mathrm{rk}(A), \mu^\mathrm{T} A \mu \right) \; .\]

First, we formulate the residuals in terms of transformed measurements $\tilde{y}$:

\[\label{eq:rss-y-s1} \begin{array}{rlcl} \hat{\varepsilon} & = \tilde{y} - \tilde{X} \hat{\beta} & \quad \text{where} \quad & \hat{\beta} = (\tilde{X}^\mathrm{T} \tilde{X})^{-1} \tilde{X}^\mathrm{T} \tilde{y} \\ & = (I_n - \tilde{P}) \tilde{y} & \quad \text{where} \quad & \tilde{P} = \tilde{X} (\tilde{X}^\mathrm{T} \tilde{X})^{-1} \tilde{X}^\mathrm{T} \\ & = \tilde{R} \tilde{y} & \quad \text{where} \quad & \tilde{R} = I_n - \tilde{P} \; . \end{array}\]

Next, we observe that the residual sum of squares can be represented as a quadratic form:

\[\label{eq:rss-y-s2} \frac{1}{\sigma^2} \sum_{i=1}^{n} \hat{\varepsilon}_i^2 = \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \tilde{y}^\mathrm{T} \tilde{R}^\mathrm{T} \tilde{R} \tilde{y} / \sigma^2\]

Because the residual-forming matrix $\tilde{R}$ is symmetric and idempotent, we have $\tilde{R}^\mathrm{T} = \tilde{R}$ and $\tilde{R}^2 = \tilde{R}$, such that:

\[\label{eq:rss-y-s3} \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \tilde{y}^\mathrm{T} \tilde{R} \tilde{y} / \sigma^2 \; .\]

Since $\tilde{R}$ is idempotent, its rank is equal to its trace, such that:

\[\label{eq:R-rk-tr} \mathrm{rk}(\tilde{R}) = \mathrm{tr}(\tilde{R}) \; .\]

With that, we can apply Cochran’s theorem given by \eqref{eq:mvn-cochran} which yields

\[\label{eq:rss-dist} \begin{split} \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} &\sim \chi^2\left( \mathrm{rk}(\tilde{R}), \, \beta^\mathrm{T} \tilde{X}^\mathrm{T} \tilde{R} \tilde{X} \beta \right) \\ &\sim \chi^2\left( \mathrm{tr}(I_n - \tilde{P}), \, \beta^\mathrm{T} \tilde{X}^\mathrm{T} (I_n - \tilde{P}) \tilde{X} \beta \right) \\ &\sim \chi^2\left( \mathrm{tr}(I_n) - \mathrm{tr}( \tilde{X} (\tilde{X}^\mathrm{T} \tilde{X})^{-1} \tilde{X}^\mathrm{T} ), \, \beta^\mathrm{T} (\tilde{X}^\mathrm{T} \tilde{X} - \tilde{X}^\mathrm{T} \tilde{X} (\tilde{X}^\mathrm{T} \tilde{X})^{-1} \tilde{X}^\mathrm{T} \tilde{X}) \beta \right) \\ &\sim \chi^2\left( \mathrm{tr}(I_n) - \mathrm{tr}( \tilde{X}^\mathrm{T} \tilde{X} (\tilde{X}^\mathrm{T} \tilde{X})^{-1} ), \, \beta^\mathrm{T} (\tilde{X}^\mathrm{T} \tilde{X} - \tilde{X}^\mathrm{T} \tilde{X}) \beta \right) \\ &\sim \chi^2\left( \mathrm{tr}(I_n) - \mathrm{tr}(I_p), \, \beta^\mathrm{T} 0_{pp} \beta \right) \\ &\sim \chi^2\left( n - p, \, 0 \right) \; . \end{split}\]

Because a non-central chi-squared distribution with non-centrality parameter of zero reduces to the central chi-squared distribution, we obtain our final result:

\[\label{eq:rss-dist-qed} \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} \sim \chi^2(n-p) \; .\]

∎

Sources:

Koch, Karl-Rudolf (2007): "Estimation of the Variance Factor in Traditional Statistics"; in: Introduction to Bayesian Statistics, Springer, Berlin/Heidelberg, 2007, ch. 4.2.3, eq. 4.37; URL: https://www.springer.com/de/book/9783540727231; DOI: 10.1007/978-3-540-72726-2.
Penny, William (2006): "Estimating error variance"; in: Mathematics for Brain Imaging, ch. 2.2, pp. 49-51, eqs. 2.4-2.8; URL: https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf.
Wikipedia (2022): "Ordinary least squares"; in: Wikipedia, the free encyclopedia, retrieved on 2022-12-13; URL: https://en.wikipedia.org/wiki/Ordinary_least_squares#Estimation.
ocram (2022): "Why is RSS distributed chi square times n-p?"; in: StackExchange CrossValidated, retrieved on 2022-12-21; URL: https://stats.stackexchange.com/a/20230.

Metadata: ID: P390 | shortcut: mlr-rssdist | author: JoramSoch | date: 2022-12-13, 07:08.