Proof: Ordinary least squares for multiple linear regression
Index:
The Book of Statistical Proofs ▷
Statistical Models ▷
Univariate normal data ▷
Multiple linear regression ▷
Ordinary least squares
Metadata: ID: P2 | shortcut: mlr-ols | author: JoramSoch | date: 2019-09-27, 07:18.
Theorem: Given a linear regression model with independent observations
\[\label{eq:MLR} y = X\beta + \varepsilon, \; \varepsilon_i \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2) \; ,\]the parameters minimizing the residual sum of squares are given by
\[\label{eq:OLS} \hat{\beta} = (X^\mathrm{T} X)^{-1} X^\mathrm{T} y \; .\]Proof: Let $\hat{\beta}$ be the ordinary least squares (OLS) solution and let $\hat{\varepsilon} = y - X\hat{\beta}$ be the resulting vector of residuals. Then, this vector must be orthogonal to the design matrix,
\[\label{eq:X-e-orth} X^\mathrm{T} \hat{\varepsilon} = 0 \; ,\]because if it wasn’t, there would be another solution $\tilde{\beta}$ giving another vector $\tilde{\varepsilon}$ with a smaller residual sum of squares. From \eqref{eq:X-e-orth}, the OLS formula can be directly derived:
\[\label{eq:OLS-qed} \begin{split} X^\mathrm{T} \hat{\varepsilon} &= 0 \\ X^\mathrm{T} \left( y - X\hat{\beta} \right) &= 0 \\ X^\mathrm{T} y - X^\mathrm{T} X\hat{\beta} &= 0 \\ X^\mathrm{T} X\hat{\beta} &= X^\mathrm{T} y \\ \hat{\beta} &= (X^\mathrm{T} X)^{-1} X^\mathrm{T} y \; . \end{split}\]∎
Sources: - Stephan, Klaas Enno (2010): "The General Linear Model (GLM)"; in: Methods and models for fMRI data analysis in neuroeconomics, Lecture 3, Slides 10/11; URL: http://www.socialbehavior.uzh.ch/teaching/methodsspring10.html.
Metadata: ID: P2 | shortcut: mlr-ols | author: JoramSoch | date: 2019-09-27, 07:18.