Proof: Akaike information criterion for multiple linear regression
Theorem: Consider a linear regression model $m$
\[\label{eq:mlr} m: \; y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V) \; .\]Then, the Akaike information criterion for this model is
\[\label{eq:mlr-aic} \mathrm{AIC}(m) = n \log\left( \frac{\mathrm{wRSS}}{n} \right) + n \left[ 1 + \log(2\pi) \right] + \log|V| + 2 (p + 1)\]where $\mathrm{wRSS}$ is the weighted residual sum of squares, $p$ is the number of regressors in the design matrix $X$ and $n$ is the number of observations in the data vector $y$.
Proof: The Akaike information criterion is defined as
\[\label{eq:aic} \mathrm{AIC}(m) = -2 \, \mathrm{MLL}(m) + 2 \, k\]where $\mathrm{MLL}(m)$ is the maximum log-likelihood is $k$ is the number of free parameters in $m$.
The maximum log-likelihood for multiple linear regression is given by
\[\label{eq:mlr-mll} \mathrm{MLL}(m) = - \frac{n}{2} \log\left( \frac{\mathrm{wRSS}}{n} \right) - \frac{n}{2} \left[ 1 + \log(2\pi) \right] - \frac{1}{2} \log|V|\]and the number of free paramters in multiple linear regression is $k = p + 1$, i.e. one for each regressor in the design matrix $X$, plus one for the noise variance $\sigma^2$.
Thus, the AIC of $m$ follows from \eqref{eq:aic} and \eqref{eq:mlr-mll} as
\[\label{eq:mlr-aic-qed} \mathrm{AIC}(m) = n \log\left( \frac{\mathrm{wRSS}}{n} \right) + n \left[ 1 + \log(2\pi) \right] + \log|V| + 2 (p + 1) \; .\]- Claeskens G, Hjort NL (2008): "Akaike's information criterion"; in: Model Selection and Model Averaging, ex. 2.2, p. 66; URL: https://www.cambridge.org/core/books/model-selection-and-model-averaging/E6F1EC77279D1223423BB64FC3A12C37; DOI: 10.1017/CBO9780511790485.
Metadata: ID: P307 | shortcut: mlr-aic | author: JoramSoch | date: 2022-02-11, 06:26.