Index: The Book of Statistical ProofsGeneral Theorems ▷ Information theory ▷ Shannon entropy ▷ Concavity

Theorem: The entropy is concave in the probability mass function $p$, i.e.

\[\label{eq:ent-conc} \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] \geq \lambda \mathrm{H}[p_1] + (1-\lambda) \mathrm{H}[p_2]\]

where $p_1$ and $p_2$ are probability mass functions and $0 \leq \lambda \leq 1$.

Proof: Let $X$ be a discrete random variable with possible outcomes $\mathcal{X}$ and let $u(x)$ be the probability mass function of a discrete uniform distribution on $X \in \mathcal{X}$. Then, the entropy of an arbitrary probability mass function $p(x)$ can be rewritten as

\[\label{eq:ent-kl} \begin{split} \mathrm{H}[p] &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log p(x) \\ &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log \frac{p(x)}{u(x)} u(x) \\ &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log \frac{p(x)}{u(x)} - \sum_{x \in \mathcal{X}} p(x) \cdot \log u(x) \\ &= - \mathrm{KL}[p||u] - \log \frac{1}{|\mathcal{X}|} \sum_{x \in \mathcal{X}} p(x) \\ &= \log |\mathcal{X}| - \mathrm{KL}[p||u] \\ \log |\mathcal{X}| - \mathrm{H}[p] &= \mathrm{KL}[p||u] \end{split}\]

where we have applied the definition of the Kullback-Leibler divergence, the probability mass function of the discrete uniform distribution and the total sum over the probability mass function.

Note that the KL divergence is convex in the pair of probability distributions $(p,q)$:

\[\label{eq:kl-conv} \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]\]

A special case of this is given by

\[\label{eq:kl-conv-u} \begin{split} \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda u + (1-\lambda) u] &\leq \lambda \mathrm{KL}[p_1||u] + (1-\lambda) \mathrm{KL}[p_2||u] \\ \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||u] &\leq \lambda \mathrm{KL}[p_1||u] + (1-\lambda) \mathrm{KL}[p_2||u] \end{split}\]

and applying equation \eqref{eq:ent-kl}, we have

\[\label{eq:ent-conc-qed} \begin{split} \log |\mathcal{X}| - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq \lambda \left( \log |\mathcal{X}| - \mathrm{H}[p_1] \right) + (1-\lambda) \left( \log |\mathcal{X}| - \mathrm{H}[p_2] \right) \\ \log |\mathcal{X}| - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq \log |\mathcal{X}| - \lambda \mathrm{H}[p_1] - (1-\lambda) \mathrm{H}[p_2] \\ - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq - \lambda \mathrm{H}[p_1] - (1-\lambda) \mathrm{H}[p_2] \\ \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\geq \lambda \mathrm{H}[p_1] + (1-\lambda) \mathrm{H}[p_2] \end{split}\]

which is equivalent to \eqref{eq:ent-conc}.

Sources:

Metadata: ID: P149 | shortcut: ent-conc | author: JoramSoch | date: 2020-08-11, 08:29.