Index: The Book of Statistical ProofsProbability Distributions ▷ Univariate continuous distributions ▷ Normal distribution ▷ Maximum entropy distribution

Theorem: The normal distribution maximizes differential entropy for a random variable with fixed variance.

Proof: For a random variable $X$ with set of possible values $\mathcal{X}$ and probability density function $p(x)$, the differential entropy is defined as:

\[\label{eq:dent} \mathrm{h}(X) = - \int_{\mathcal{X}} p(x) \log p(x) \, \mathrm{d}x\]

Let $g(x)$ be the probability density function of a normal distribution with mean $\mu$ and variance $\sigma^2$ and let $f(x)$ be an arbitrary probability density function with the same variance. Since differential entropy is translation-invariant, we can assume that $f(x)$ has the same mean as $g(x)$.

Consider the Kullback-Leibler divergence of distribution $f(x)$ from distribution $g(x)$ which is non-negative:

\[\label{eq:kl-fg} \begin{split} 0 \leq \mathrm{KL}[f||g] &= \int_{\mathcal{X}} f(x) \log \frac{f(x)}{g(x)} \, \mathrm{d}x \\ &= \int_{\mathcal{X}} f(x) \log f(x) \, \mathrm{d}x - \int_{\mathcal{X}} f(x) \log g(x) \, \mathrm{d}x \\ &\overset{\eqref{eq:dent}}{=} - \mathrm{h}[f(x)] - \int_{\mathcal{X}} f(x) \log g(x) \, \mathrm{d}x \; . \end{split}\]

By plugging the probability density function of the normal distribution into the second term, we obtain:

\[\label{eq:int-fg-s1} \begin{split} \int_{\mathcal{X}} f(x) \log g(x) \, \mathrm{d}x &= \int_{\mathcal{X}} f(x) \log \left( \frac{1}{\sqrt{2 \pi} \sigma} \cdot \exp \left[ -\frac{1}{2} \left( \frac{x-\mu}{\sigma} \right)^2 \right] \right) \, \mathrm{d}x \\ &= \int_{\mathcal{X}} f(x) \log \left( \frac{1}{\sqrt{2 \pi \sigma^2}} \right) \, \mathrm{d}x + \int_{\mathcal{X}} f(x) \log \left( \exp \left[ -\frac{(x-\mu)^2}{2 \sigma^2} \right] \right) \, \mathrm{d}x \\ &= -\frac{1}{2} \log \left( 2 \pi \sigma^2 \right) \int_{\mathcal{X}} f(x) \, \mathrm{d}x - \frac{\log(e)}{2 \sigma^2} \int_{\mathcal{X}} f(x) (x-\mu)^2 \, \mathrm{d}x \; . \end{split}\]

Because the entire integral over a probability density function is one and the second central moment is equal to the variance, we have:

\[\label{eq:int-fg-s2} \begin{split} \int_{\mathcal{X}} f(x) \log g(x) \, \mathrm{d}x &= -\frac{1}{2} \log \left( 2 \pi \sigma^2 \right) - \frac{\log(e) \sigma^2}{2 \sigma^2} \\ &= -\frac{1}{2} \left[ \log \left( 2 \pi \sigma^2 \right) + \log(e) \right] \\ &= -\frac{1}{2} \log \left( 2 \pi \sigma^2 e \right) \; . \end{split}\]

This is actually the negative of the differential entropy of the normal distribution, such that:

\[\label{eq:int-fg-s3} \int_{\mathcal{X}} f(x) \log g(x) \, \mathrm{d}x = -\mathrm{h}[\mathcal{N}(\mu,\sigma^2)] = -\mathrm{h}[g(x)] \; .\]

Combining \eqref{eq:kl-fg} with \eqref{eq:int-fg-s3}, we can show that

\[\label{eq:norm-maxent} \begin{split} 0 &\leq \mathrm{KL}[f||g] \\ 0 &\leq - \mathrm{h}[f(x)] - \left( -\mathrm{h}[g(x)] \right) \\ \mathrm{h}[g(x)] &\geq \mathrm{h}[f(x)] \end{split}\]

which means that the differential entropy of the normal distribution $\mathcal{N}(\mu, \sigma^2)$ will be larger than or equal to any other distribution with the same variance $\sigma^2$.

Sources:

Metadata: ID: P250 | shortcut: norm-maxent | author: JoramSoch | date: 2020-08-25, 08:31.