Index: The Book of Statistical ProofsGeneral Theorems ▷ Information theory ▷ Kullback-Leibler divergence ▷ Relation to discrete entropy

Theorem: Let $X$ be a discrete random variable with possible outcomes $\mathcal{X}$ and let $P$ and $Q$ be two probability distributions on $X$. Then, the Kullback-Leibler divergence of $P$ from $Q$ can be expressed as

\[\label{eq:kl-ent} \mathrm{KL}[P||Q] = \mathrm{H}(P,Q) - \mathrm{H}(P)\]

where $\mathrm{H}(P,Q)$ is the cross-entropy of $P$ and $Q$ and $\mathrm{H}(P)$ is the marginal entropy of $P$.

Proof: The discrete Kullback-Leibler divergence is defined as

\[\label{eq:KL} \mathrm{KL}[P||Q] = \sum_{x \in \mathcal{X}} p(x) \cdot \log \frac{p(x)}{q(x)}\]

where $p(x)$ and $q(x)$ are the probability mass functions of $P$ and $Q$.

Separating the logarithm, we have:

\[\label{eq:KL-dev} \mathrm{KL}[P||Q] = - \sum_{x \in \mathcal{X}} p(x) \, \log q(x) + \sum_{x \in \mathcal{X}} p(x) \, \log p(x) \; .\]

Now considering the definitions of marginal entropy and cross-entropy

\[\label{eq:ME-CE} \begin{split} \mathrm{H}(P) &= - \sum_{x \in \mathcal{X}} p(x) \, \log p(x) \\ \mathrm{H}(P,Q) &= - \sum_{x \in \mathcal{X}} p(x) \, \log q(x) \; , \end{split}\]

we can finally show:

\[\label{eq:KL-qed} \mathrm{KL}[P||Q] = \mathrm{H}(P,Q) - \mathrm{H}(P) \; .\]
Sources:

Metadata: ID: P113 | shortcut: kl-ent | author: JoramSoch | date: 2020-05-27, 23:20.