Proof: Concavity of the Shannon entropy
Theorem: The entropy is concave in the probability mass function $p$, i.e.
\[\label{eq:ent-conc} \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] \geq \lambda \mathrm{H}[p_1] + (1-\lambda) \mathrm{H}[p_2]\]where $p_1$ and $p_2$ are probability mass functions and $0 \leq \lambda \leq 1$.
Proof: Let $X$ be a discrete random variable with possible outcomes $\mathcal{X}$ and let $u(x)$ be the probability mass function of a discrete uniform distribution on $X \in \mathcal{X}$. Then, the entropy of an arbitrary probability mass function $p(x)$ can be rewritten as
\[\label{eq:ent-kl} \begin{split} \mathrm{H}[p] &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log p(x) \\ &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log \frac{p(x)}{u(x)} u(x) \\ &= - \sum_{x \in \mathcal{X}} p(x) \cdot \log \frac{p(x)}{u(x)} - \sum_{x \in \mathcal{X}} p(x) \cdot \log u(x) \\ &= - \mathrm{KL}[p||u] - \log \frac{1}{|\mathcal{X}|} \sum_{x \in \mathcal{X}} p(x) \\ &= \log |\mathcal{X}| - \mathrm{KL}[p||u] \\ \log |\mathcal{X}| - \mathrm{H}[p] &= \mathrm{KL}[p||u] \end{split}\]where we have applied the definition of the Kullback-Leibler divergence, the probability mass function of the discrete uniform distribution and the total sum over the probability mass function.
Note that the KL divergence is convex in the pair of probability distributions $(p,q)$:
\[\label{eq:kl-conv} \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]\]A special case of this is given by
\[\label{eq:kl-conv-u} \begin{split} \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda u + (1-\lambda) u] &\leq \lambda \mathrm{KL}[p_1||u] + (1-\lambda) \mathrm{KL}[p_2||u] \\ \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||u] &\leq \lambda \mathrm{KL}[p_1||u] + (1-\lambda) \mathrm{KL}[p_2||u] \end{split}\]and applying equation \eqref{eq:ent-kl}, we have
\[\label{eq:ent-conc-qed} \begin{split} \log |\mathcal{X}| - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq \lambda \left( \log |\mathcal{X}| - \mathrm{H}[p_1] \right) + (1-\lambda) \left( \log |\mathcal{X}| - \mathrm{H}[p_2] \right) \\ \log |\mathcal{X}| - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq \log |\mathcal{X}| - \lambda \mathrm{H}[p_1] - (1-\lambda) \mathrm{H}[p_2] \\ - \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\leq - \lambda \mathrm{H}[p_1] - (1-\lambda) \mathrm{H}[p_2] \\ \mathrm{H}[\lambda p_1 + (1-\lambda) p_2] &\geq \lambda \mathrm{H}[p_1] + (1-\lambda) \mathrm{H}[p_2] \end{split}\]which is equivalent to \eqref{eq:ent-conc}.
- Wikipedia (2020): "Entropy (information theory)"; in: Wikipedia, the free encyclopedia, retrieved on 2020-08-11; URL: https://en.wikipedia.org/wiki/Entropy_(information_theory)#Further_properties.
- Cover TM, Thomas JA (1991): "Elements of Information Theory", p. 30; URL: https://www.wiley.com/en-us/Elements+of+Information+Theory%2C+2nd+Edition-p-9780471241959.
- Xie, Yao (2012): "Chain Rules and Inequalities"; in: ECE587: Information Theory, Lecture 3, Slide 25; URL: https://www2.isye.gatech.edu/~yxie77/ece587/Lecture3.pdf.
- Goh, Siong Thye (2016): "Understanding the proof of the concavity of entropy"; in: StackExchange Mathematics, retrieved on 2020-11-08; URL: https://math.stackexchange.com/questions/2000194/understanding-the-proof-of-the-concavity-of-entropy.
Metadata: ID: P149 | shortcut: ent-conc | author: JoramSoch | date: 2020-08-11, 08:29.