Proof: Maximum likelihood estimation for the general linear model
Theorem: Given a general linear model with matrix-normally distributed errors
\[\label{eq:GLM} Y = X B + E, \; E \sim \mathcal{MN}(0, V, \Sigma) \; ,\]maximum likelihood estimates for the unknown parameters $B$ and $\Sigma$ are given by
\[\label{eq:GLM-MLE} \begin{split} \hat{B} &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} Y \\ \hat{\Sigma} &= \frac{1}{n} (Y - X\hat{B})^\mathrm{T} V^{-1} (Y - X\hat{B}) \; . \end{split}\]Proof: In \eqref{eq:GLM}, $Y$ is an $n \times v$ matrix of measurements ($n$ observations, $v$ dependent variables), $X$ is an $n \times p$ design matrix ($n$ observations, $p$ independent variables) and $V$ is an $n \times n$ covariance matrix across observations. This multivariate GLM implies the following likelihood function
\[\label{eq:GLM-LF} \begin{split} p(Y|B,\Sigma) &= \mathcal{MN}(Y; XB, V, \Sigma) \\ &= \sqrt{\frac{1}{(2\pi)^{nv} |\Sigma|^n |V|^v}} \cdot \exp\left[ -\frac{1}{2} \, \mathrm{tr}\left( \Sigma^{-1} (Y - XB)^\mathrm{T} V^{-1} (Y - XB) \right) \right] \end{split}\]and the log-likelihood function
\[\label{eq:GLM-LL1} \begin{split} \mathrm{LL}(B,\Sigma) = &\log p(Y|B,\Sigma) \\ = &- \frac{nv}{2} \log(2\pi) - \frac{n}{2} \log |\Sigma| - \frac{v}{2} \log |V| \\ &- \frac{1}{2} \, \mathrm{tr}\left[ \Sigma^{-1} (Y - XB)^\mathrm{T} V^{-1} (Y - XB) \right] \; . \end{split}\]Substituting $V^{-1}$ by the precision matrix $P$ to ease notation, we have:
\[\label{eq:GLM-LL2} \begin{split} \mathrm{LL}(B,\Sigma) = &- \frac{nv}{2} \log(2\pi) - \frac{n}{2} \log |\Sigma| + \frac{v}{2} \log |P| \\ &- \frac{1}{2} \, \mathrm{tr}\left[ \Sigma^{-1} \left( Y^\mathrm{T} P Y - Y^\mathrm{T} P X B - B^\mathrm{T} X^\mathrm{T} P Y + B^\mathrm{T} X^\mathrm{T} P X B \right) \right] \; . \end{split}\]
The derivative of the log-likelihood function \eqref{eq:GLM-LL2} with respect to $B$ is
and setting this derivative to zero gives the MLE for $B$:
\[\label{eq:B-MLE} \begin{split} \frac{\mathrm{d}\mathrm{LL}(\hat{B},\Sigma)}{\mathrm{d}B} &= 0 \\ 0 &= X^\mathrm{T} P Y \Sigma^{-1} - X^\mathrm{T} P X \hat{B} \Sigma^{-1} \\ 0 &= X^\mathrm{T} P Y - X^\mathrm{T} P X \hat{B} \\ X^\mathrm{T} P X \hat{B} &= X^\mathrm{T} P Y \\ \hat{B} &= \left( X^\mathrm{T} P X \right)^{-1} X^\mathrm{T} P Y \; . \end{split}\]
The derivative of the log-likelihood function \eqref{eq:GLM-LL1} at $\hat{B}$ with respect to $\Sigma$ is
and setting this derivative to zero gives the MLE for $\Sigma$:
\[\label{eq:S-MLE} \begin{split} \frac{\mathrm{d}\mathrm{LL}(\hat{B},\hat{\Sigma})}{\mathrm{d}\Sigma} &= 0 \\ 0 &= - \frac{n}{2} \, \hat{\Sigma}^{-1} + \frac{1}{2} \, \hat{\Sigma}^{-1} (Y - X\hat{B})^\mathrm{T} V^{-1} (Y - X\hat{B}) \, \hat{\Sigma}^{-1} \\ \frac{n}{2} \, \hat{\Sigma}^{-1} &= \frac{1}{2} \, \hat{\Sigma}^{-1} (Y - X\hat{B})^\mathrm{T} V^{-1} (Y - X\hat{B}) \, \hat{\Sigma}^{-1} \\ \hat{\Sigma}^{-1} &= \frac{1}{n} \, \hat{\Sigma}^{-1} (Y - X\hat{B})^\mathrm{T} V^{-1} (Y - X\hat{B}) \, \hat{\Sigma}^{-1} \\ I_v &= \frac{1}{n} \, (Y - X\hat{B})^\mathrm{T} V^{-1} (Y - X\hat{B}) \, \hat{\Sigma}^{-1} \\ \hat{\Sigma} &= \frac{1}{n} \, (Y - X\hat{B})^\mathrm{T} V^{-1} (Y - X\hat{B}) \; . \end{split}\]
Together, \eqref{eq:B-MLE} and \eqref{eq:S-MLE} constitute the MLE for the GLM.
- Petersen, Kaare Brandt; Pedersen, Michael Syskind (2012): "Derivatives"; in: The Matrix Cookbook, Section 2, eqs. (100), (117), (57), (124); URL: https://www2.imm.dtu.dk/pubdb/pubs/3274-full.html.
Metadata: ID: P7 | shortcut: glm-mle | author: JoramSoch | date: 2019-12-06, 10:40.