Proof: Maximum likelihood estimator of variance in multiple linear regression is biased
Theorem: Consider a linear regression model with known design matrix $X$, known covariance structure $V$, unknown regression parameters $\beta$ and unknown noise variance $\sigma^2$:
\[\label{eq:mlr} y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V) \; .\]Then,
1) the maximum likelihood estimator of $\sigma^2$ is
\[\label{eq:sigma-mle} \hat{\sigma}^2 = \frac{1}{n} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta})\]where
\[\label{eq:beta-mle} \hat{\beta} = (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y\]2) and $\hat{\sigma}^2$ is a biased estimator of $\sigma^2$
\[\label{eq:resvar-var} \mathrm{E}\left[ \hat{\sigma}^2 \right] \neq \sigma^2 \; ,\]more precisely:
\[\label{eq:resvar-biasp} \mathrm{E}\left[ \hat{\sigma}^2 \right] = \frac{n-p}{n} \sigma^2 \; .\]Proof:
1) This follows from maximum likelihood estimation for multiple linear regression and is a special case of maximum likelihood estimation for the general linear model in which $Y = y$, $B = \beta$ and $\Sigma = \sigma^2$:
\[\label{eq:sigma-mle-qed} \begin{split} \hat{\sigma}^2 &= \frac{1}{n} (Y-X\hat{B})^\mathrm{T} V^{-1} (Y-X\hat{B}) \\ &= \frac{1}{n} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta}) \; . \end{split}\]2) We know that the residual sum of squares, divided by the true noise variance, is following a chi-squared distribution:
\[\label{eq:rss-dist} \begin{split} \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} &\sim \chi^2(n-p) \\ \text{where} \quad \hat{\varepsilon}^\mathrm{T} \hat{\varepsilon} &= (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta}) \; . \end{split}\]Thus, combining \eqref{eq:rss-dist} and \eqref{eq:sigma-mle-qed}, we have:
\[\label{eq:resvar-bias-s1} \frac{n \hat{\sigma}^2}{\sigma^2} \sim \chi^2(n-p) \; .\]Using the relationship between chi-squared distribution and gamma distribution
\[\label{eq:chi2-gam} X \sim \chi^2(k) \quad \Rightarrow \quad cX \sim \mathrm{Gam}\left( \frac{k}{2}, \frac{1}{2c} \right) \; ,\]we can deduce from \eqref{eq:resvar-bias-s1} that
\[\label{eq:resvar-bias-s2} \hat{\sigma}^2 = \frac{\sigma^2}{n} \cdot \frac{n \hat{\sigma}^2}{\sigma^2} \sim \mathrm{Gam}\left( \frac{n-p}{2}, \frac{n}{2\sigma^2} \right) \; .\]Using the expected value of the gamma distribution
\[\label{eq:gam-mean} X \sim \mathrm{Gam}(a,b) \quad \Rightarrow \quad \mathrm{E}(X) = \frac{a}{b} \; ,\]we can deduce from \eqref{eq:resvar-bias-s2} that
\[\label{eq:resvar-bias-s3} \mathrm{E}\left[ \hat{\sigma}^2 \right] = \frac{\frac{n-p}{2}}{\frac{n}{2\sigma^2}} = \frac{n-p}{n} \sigma^2\]which proves the relationship given by \eqref{eq:resvar-biasp}.
- ocram (2022): "Why is RSS distributed chi square times n-p?"; in: StackExchange CrossValidated, retrieved on 2022-12-21; URL: https://stats.stackexchange.com/a/20230.
Metadata: ID: P398 | shortcut: resvar-biasp | author: JoramSoch | date: 2022-12-21, 14:15.