Proof: Maximum likelihood estimation for the univariate Gaussian
Theorem: Let there be a univariate Gaussian data set $y = \left\lbrace y_1, \ldots, y_n \right\rbrace$:
\[\label{eq:ug} y_i \sim \mathcal{N}(\mu, \sigma^2), \quad i = 1, \ldots, n \; .\]Then, the maximum likelihood estimates for mean $\mu$ and variance $\sigma^2$ are given by
\[\label{eq:ug-MLE} \begin{split} \hat{\mu} &= \frac{1}{n} \sum_{i=1}^n y_i \\ \hat{\sigma}^2 &= \frac{1}{n} \sum_{i=1}^n (y_i - \bar{y})^2 \; . \end{split}\]Proof: The likelihood function for each observation is given by the probability density function of the normal distribution
\[\label{eq:ug-yi} p(y_i|\mu,\sigma^2) = \mathcal{N}(x; \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp \left[ -\frac{1}{2} \left( \frac{y_i-\mu}{\sigma} \right)^2 \right]\]and because observations are independent, the likelihood function for all observations is the product of the individual ones:
\[\label{eq:ug-LF-s1} p(y|\mu,\sigma^2) = \prod_{i=1}^n p(y_i|\mu) = \sqrt{ \frac{1}{(2 \pi \sigma^2)^n} } \cdot \exp \left[ -\frac{1}{2} \sum_{i=1}^{n} \left( \frac{y_i-\mu}{\sigma} \right)^2 \right] \; .\]This can be developed into
\[\label{eq:ug-LF-s2} \begin{split} p(y|\mu,\sigma^2) &= \left( \frac{1}{2 \pi \sigma^2} \right)^{n/2} \cdot \exp \left[ -\frac{1}{2} \sum_{i=1}^{n} \left( \frac{y_i^2 - 2 y_i \mu + \mu^2}{\sigma^2} \right) \right] \\ &= \left( \frac{1}{2 \pi \sigma^2} \right)^{n/2} \cdot \exp \left[ -\frac{1}{2 \sigma^2} \left( y^\mathrm{T} y - 2 n \bar{y} \mu + n \mu^2 \right) \right] \end{split}\]where $\bar{y} = \frac{1}{n} \sum_{i=1}^{n} y_i$ is the mean of data points and $y^\mathrm{T} y = \sum_{i=1}^{n} y_i^2$ is the sum of squared data points.
Thus, the log-likelihood function is
\[\label{eq:ug-LL} \mathrm{LL}(\mu,\sigma^2) = \log p(y|\mu,\sigma^2) = -\frac{n}{2} \log (2 \pi \sigma^2) - \frac{1}{2 \sigma^2} \left( y^\mathrm{T} y - 2 n \bar{y} \mu + n \mu^2 \right) \; .\]
The derivative of the log-likelihood function \eqref{eq:ug-LL} with respect to $\mu$ is
and setting this derivative to zero gives the MLE for $\mu$:
\[\label{eq:mu-MLE} \begin{split} \frac{\mathrm{d}\mathrm{LL}(\hat{\mu},\sigma^2)}{\mathrm{d}\mu} &= 0 \\ 0 &= \frac{n}{\sigma^2} (\bar{y}-\hat{\mu}) \\ 0 &= \bar{y}-\hat{\mu} \\ \hat{\mu} &= \bar{y} \\ \hat{\mu} &= \frac{1}{n} \sum_{i=1}^n y_i \; . \end{split}\]
The derivative of the log-likelihood function \eqref{eq:ug-LL} at $\hat{\mu}$ with respect to $\sigma^2$ is
and setting this derivative to zero gives the MLE for $\sigma^2$:
\[\label{eq:s2-MLE} \begin{split} \frac{\mathrm{d}\mathrm{LL}(\hat{\mu},\hat{\sigma}^2)}{\mathrm{d}\sigma^2} &= 0 \\ 0 &= \frac{1}{2 (\hat{\sigma}^2)^2} \sum_{i=1}^n (y_i - \hat{\mu})^2 \\ \frac{n}{2 \hat{\sigma}^2} &= \frac{1}{2 (\hat{\sigma}^2)^2} \sum_{i=1}^n (y_i - \hat{\mu})^2 \\ \frac{2 (\hat{\sigma}^2)^2}{n} \cdot \frac{n}{2 \hat{\sigma}^2} &= \frac{2 (\hat{\sigma}^2)^2}{n} \cdot \frac{1}{2 (\hat{\sigma}^2)^2} \sum_{i=1}^n (y_i - \hat{\mu})^2 \\ \hat{\sigma}^2 &= \frac{1}{n} \sum_{i=1}^n (y_i - \hat{\mu})^2 \\ \hat{\sigma}^2 &= \frac{1}{n} \sum_{i=1}^n (y_i - \bar{y})^2 \\ \end{split}\]
Together, \eqref{eq:mu-MLE} and \eqref{eq:s2-MLE} constitute the MLE for the univariate Gaussian.
- Bishop CM (2006): "Bayesian inference for the Gaussian"; in: Pattern Recognition for Machine Learning, pp. 93-94, eqs. 2.121, 2.122; URL: http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf.
Metadata: ID: P223 | shortcut: ug-mle | author: JoramSoch | date: 2021-04-16, 11:03.