Index: The Book of Statistical ProofsGeneral Theorems ▷ Information theory ▷ Shannon entropy ▷ Convexity of cross-entropy

Theorem: The cross-entropy is convex in the probability distribution $q$, i.e.

\[\label{eq:ent-cross-conv} \mathrm{H}[p,\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{H}[p,q_1] + (1-\lambda) \mathrm{H}[p,q_2]\]

where $p$ is a fixed and $q_1$ and $q_2$ are any two probability distributions and $0 \leq \lambda \leq 1$.

Proof: The relationship between Kullback-Leibler divergence, entropy and cross-entropy is:

\[\label{eq:kl-ent} \mathrm{KL}[P||Q] = \mathrm{H}(P,Q) - \mathrm{H}(P) \; .\]

Note that the KL divergence is convex in the pair of probability distributions $(p,q)$:

\[\label{eq:kl-conv} \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]\]

A special case of this is given by

\[\label{eq:kl-conv-p} \begin{split} \mathrm{KL}[\lambda p + (1-\lambda) p||\lambda q_1 + (1-\lambda) q_2] &\leq \lambda \mathrm{KL}[p||q_1] + (1-\lambda) \mathrm{KL}[p||q_2] \\ \mathrm{KL}[p||\lambda q_1 + (1-\lambda) q_2] &\leq \lambda \mathrm{KL}[p||q_1] + (1-\lambda) \mathrm{KL}[p||q_2] \end{split}\]

and applying equation \eqref{eq:kl-ent}, we have

\[\label{eq:ent-cross-conv-qed} \begin{split} \mathrm{H}[p,\lambda q_1 + (1-\lambda) q_2] - \mathrm{H}[p] &\leq \lambda \left( \mathrm{H}[p,q_1] - \mathrm{H}[p] \right) + (1-\lambda) \left( \mathrm{H}[p,q_2] - \mathrm{H}[p] \right) \\ \mathrm{H}[p,\lambda q_1 + (1-\lambda) q_2] - \mathrm{H}[p] &\leq \lambda \mathrm{H}[p,q_1] + (1-\lambda) \mathrm{H}[p,q_2] - \mathrm{H}[p] \\ \mathrm{H}[p,\lambda q_1 + (1-\lambda) q_2] &\leq \lambda \mathrm{H}[p,q_1] + (1-\lambda) \mathrm{H}[p,q_2] \end{split}\]

which is equivalent to \eqref{eq:ent-cross-conv}.

Sources:

Metadata: ID: P150 | shortcut: entcross-conv | author: JoramSoch | date: 2020-08-11, 09:16.