Distributions of estimated parameters, fitted signal and residuals in multiple linear regression upon weighted least squares

Index: The Book of Statistical Proofs ▷ Statistical Models ▷ Univariate normal data ▷ Multiple linear regression ▷ Distribution of WLS estimates, signal and residuals

Theorem: Assume a linear regression model with correlated observations

\[\label{eq:mlr} y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V)\]

and consider estimation using weighted least squares. Then, the estimated parameters, fitted signal and residuals are distributed as

\[\label{eq:mlr-dist} \begin{split} \hat{\beta} &\sim \mathcal{N}\left( \beta, \sigma^2 (X^\mathrm{T} V^{-1} X)^{-1} \right) \\ \hat{y} &\sim \mathcal{N}\left( X \beta, \sigma^2 (PV) \right) \\ \hat{\varepsilon} &\sim \mathcal{N}\left( 0, \sigma^2 (I_n - P) V \right) \end{split}\]

where $P$ is the projection matrix for weighted least squares

\[\label{eq:mlr-pmat} P = X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} \; .\]

Proof: We will use the linear transformation theorem for the multivariate normal distribution:

\[\label{eq:mvn-ltt} x \sim \mathcal{N}(\mu, \Sigma) \quad \Rightarrow \quad y = Ax + b \sim \mathcal{N}(A\mu + b, A \Sigma A^\mathrm{T}) \; .\]

Applying \eqref{eq:mvn-ltt} to \eqref{eq:mlr}, the measured data are distributed as

\[\label{eq:y-dist} y \sim \mathcal{N}\left( X \beta, \sigma^2 V \right) \; .\]

1) The parameter estimates from weighted least sqaures are given by

\[\label{eq:b-est} \hat{\beta} = (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y\]

and thus, by applying \eqref{eq:mvn-ltt} to \eqref{eq:b-est}, they are distributed as

\[\label{eq:b-est-dist} \begin{split} \hat{\beta} &\sim \mathcal{N}\left( \left[ (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} \right] X \beta, \, \sigma^2 \left[ (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} \right] V \left[ V^{-1} X (X^\mathrm{T} V^{-1} X)^{-1} \right] \right) \\ &\sim \mathcal{N}\left( \beta, \, \sigma^2 (X^\mathrm{T} V^{-1} X)^{-1} \right) \; . \end{split}\]

2) The fitted signal in multiple linear regression is given by

\[\label{eq:y-est} \hat{y} = X \hat{\beta} = X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y = P y\]

and thus, by applying \eqref{eq:mvn-ltt} to \eqref{eq:y-est}, they are distributed as

\[\label{eq:y-est-dist} \begin{split} \hat{y} &\sim \mathcal{N}\left( X \beta, \, \sigma^2 X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} \right) \\ &\sim \mathcal{N}\left( X \beta, \, \sigma^2 (PV) \right) \; . \end{split}\]

3) The residuals of the linear regression model are given by

\[\label{eq:e-est} \hat{\varepsilon} = y - X \hat{\beta} = \left( I_n - X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} \right) y = \left( I_n - P \right) y\]

and thus, by applying \eqref{eq:mvn-ltt} to \eqref{eq:e-est}, they are distributed as

\[\label{eq:e-est-dist} \begin{split} \hat{\varepsilon} &\sim \mathcal{N}\left( \left[ I_n - X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} \right] X \beta, \, \sigma^2 \left[ I_n - P \right] V \left[ I_n - P \right]^\mathrm{T} \right) \\ &\sim \mathcal{N}\left( X \beta - X \beta, \, \sigma^2 \left[ V - V P^\mathrm{T} - P V + P V P^\mathrm{T} \right] \right) \\ &\sim \mathcal{N}\left( 0, \, \sigma^2 \left[ V - V V^{-1} X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} - X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} V + P V P^\mathrm{T} \right] \right) \\ &\sim \mathcal{N}\left( 0, \, \sigma^2 \left[ V - 2 P V + X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} V V^{-1} X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} \right] \right) \\ &\sim \mathcal{N}\left( 0, \, \sigma^2 \left[ V - 2 P V + P V \right] \right) \\ &\sim \mathcal{N}\left( 0, \, \sigma^2 \left[ V - P V \right] \right) \\ &\sim \mathcal{N}\left( 0, \, \sigma^2 \left[ I_n - P \right] V \right) \; . \end{split}\]

∎

Sources:

Koch, Karl-Rudolf (2007): "Linear Model"; in: Introduction to Bayesian Statistics, Springer, Berlin/Heidelberg, 2007, ch. 4, eqs. 4.2, 4.30; URL: https://www.springer.com/de/book/9783540727231; DOI: 10.1007/978-3-540-72726-2.
Penny, William (2006): "Multiple Regression"; in: Mathematics for Brain Imaging, ch. 1.5, pp. 39-41, eqs. 1.106-1.110; URL: https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf.
Soch J, Allefeld C, Haynes JD (2020): "Inverse transformed encoding models – a solution to the problem of correlated trial-by-trial parameter estimates in fMRI decoding"; in: NeuroImage, vol. 209, art. 116449, eq. A.10; URL: https://www.sciencedirect.com/science/article/pii/S1053811919310407; DOI: 10.1016/j.neuroimage.2019.116449.
Soch J, Meyer AP, Allefeld C, Haynes JD (2017): "How to improve parameter estimates in GLM-based fMRI data analysis: cross-validated Bayesian model averaging"; in: NeuroImage, vol. 158, pp. 186-195, eq. A.2; URL: https://www.sciencedirect.com/science/article/pii/S105381191730527X; DOI: 10.1016/j.neuroimage.2017.06.056.

Metadata: ID: P389 | shortcut: mlr-wlsdist | author: JoramSoch | date: 2022-12-13, 05:13.