Index: The Book of Statistical ProofsGeneral Theorems ▷ Information theory ▷ Shannon entropy ▷ Concavity

Theorem: The entropy is concave in the probability mass function $p$, i.e.

$\label{eq:ent-conc} \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] \geq \lambda \mathrm{H}[p_1] + (1-\lambda) \mathrm{H}[p_2]$

where $p_1$ and $p_2$ are probability mass functions and $0 \leq \lambda \leq 1$.

Proof: Let $X$ be a discrete random variable with possible outcomes $\mathcal{X}$ and let $u(x)$ be the probability mass function of a discrete uniform distribution on $X \in \mathcal{X}$. Then, the entropy of an arbitrary probability mass function $p(x)$ can be rewritten as

$\label{eq:ent-kl} \begin{split} \mathrm{H}[p] &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log p(x) \\ &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log \frac{p(x)}{u(x)} u(x) \\ &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log \frac{p(x)}{u(x)} - \sum_{x \in \mathcal{X}} p(x) \cdot \log u(x) \\ &= - \mathrm{KL}[p||u] - \log \frac{1}{|\mathcal{X}|} \sum_{x \in \mathcal{X}} p(x) \\ &= \log |\mathcal{X}| - \mathrm{KL}[p||u] \\ \log |\mathcal{X}| - \mathrm{H}[p] &= \mathrm{KL}[p||u] \end{split}$

Note that the KL divergence is convex in the pair of probability distributions $(p,q)$:

$\label{eq:kl-conv} \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]$

A special case of this is given by

$\label{eq:kl-conv-u} \begin{split} \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda u + (1-\lambda) u] &\leq \lambda \mathrm{KL}[p_1||u] + (1-\lambda) \mathrm{KL}[p_2||u] \\ \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||u] &\leq \lambda \mathrm{KL}[p_1||u] + (1-\lambda) \mathrm{KL}[p_2||u] \end{split}$

and applying equation \eqref{eq:ent-kl}, we have

$\label{eq:ent-conc-qed} \begin{split} \log |\mathcal{X}| - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq \lambda \left( \log |\mathcal{X}| - \mathrm{H}[p_1] \right) + (1-\lambda) \left( \log |\mathcal{X}| - \mathrm{H}[p_2] \right) \\ \log |\mathcal{X}| - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq \log |\mathcal{X}| - \lambda \mathrm{H}[p_1] - (1-\lambda) \mathrm{H}[p_2] \\ - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq - \lambda \mathrm{H}[p_1] - (1-\lambda) \mathrm{H}[p_2] \\ \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\geq \lambda \mathrm{H}[p_1] + (1-\lambda) \mathrm{H}[p_2] \end{split}$

which is equivalent to \eqref{eq:ent-conc}.

Sources:

Metadata: ID: P149 | shortcut: ent-conc | author: JoramSoch | date: 2020-08-11, 08:29.