Index: The Book of Statistical ProofsProbability Distributions ▷ Univariate discrete distributions ▷ Binomial distribution ▷ Shannon entropy

Theorem: Let $X$ be a random variable following a binomial distribution:

\[\label{eq:bin} X \sim \mathrm{Bin}(n,p) \; .\]

Then, the (Shannon) entropy of $X$ in bits is

\[\label{eq:bin-ent} \mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{bern}(p) - \mathrm{E}_\mathrm{lbc}(n,p)\]

where $\mathrm{H}_\mathrm{bern}(p)$ is the binary entropy function, i.e. the (Shannon) entropy of the Bernoulli distribution with success probability $p$

\[\label{eq:H-bern} \mathrm{H}_\mathrm{bern}(p) = - p \cdot \log_2 p - (1-p) \log_2 (1-p)\]

and $\mathrm{E}_\mathrm{lbc}(n,p)$ is the expected value of the logarithmized binomial coefficient with superset size $n$

\[\label{eq:E-lbf} \mathrm{E}_\mathrm{lbc}(n,p) = \mathrm{E}\left[ \log_2 {n \choose X} \right] \quad \text{where} \quad X \sim \mathrm{Bin}(n,p) \; .\]

Proof: The entropy is defined as the probability-weighted average of the logarithmized probabilities for all possible values:

\[\label{eq:ent} \mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .\]

Entropy is measured in bits by setting $b = 2$. Then, with the probability mass function of the binomial distribution, we have:

\[\label{eq:bin-ent-s1} \begin{split} \mathrm{H}(X) &= - \sum_{x \in \mathcal{X}} f_X(x) \cdot \log_2 f_X(x) \\ &= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \log_2 \left[ {n \choose x} \, p^x \, (1-p)^{n-x} \right] \\ &= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \left[ \log_2 {n \choose x} + x \cdot \log_2 p + (n-x) \cdot \log_2 (1-p) \right] \\ &= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \left[ \log_2 {n \choose x} + x \cdot \log_2 p + n \cdot \log_2 (1-p) - x \cdot \log_2 (1-p) \right] \; . \end{split}\]

Since the first factor in the sum corresponds to the probability mass of $X=x$, we can rewrite this as the sum of the expected values of the functions of the discrete random variable $x$ in the square bracket:

\[\label{eq:bin-ent-s2} \begin{split} \mathrm{H}(X) &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - \left\langle x \cdot \log_2 p \right\rangle_{p(x)} - \left\langle n \cdot \log_2 (1-p) \right\rangle_{p(x)} + \left\langle x \cdot \log_2 (1-p) \right\rangle_{p(x)} \\ &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - \log_2 p \cdot \left\langle x \right\rangle_{p(x)} - n \cdot \log_2 (1-p) + \log_2 (1-p) \cdot \left\langle x \right\rangle_{p(x)} \; . \end{split}\]

Using the expected value of the binomial distribution, i.e. $X \sim \mathrm{Bin}(n,p) \Rightarrow \left\langle x \right\rangle = n p$, this gives:

\[\label{eq:bin-ent-s3} \begin{split} \mathrm{H}(X) &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - n p \cdot \log_2 p - n \cdot \log_2 (1-p) + n p \cdot \log_2 (1-p) \\ &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} + n \left[ - p \cdot \log_2 p - (1-p) \log_2 (1-p) \right] \; . \end{split}\]

Finally, we note that the first term is the negative expected value of the logarithm of a binomial coefficient and that the term in square brackets is the entropy of the Bernoulli distribution, such that we finally get:

\[\label{eq:bin-ent-s4} \mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{bern}(p) - \mathrm{E}_\mathrm{lbc}(n,p) \; .\]

Note that, because $0 \leq \mathrm{H}_\mathrm{bern}(p) \leq 1$, we have $0 \leq n \cdot \mathrm{H}_\mathrm{bern}(p) \leq n$, and because the entropy is non-negative, it must hold that $n \geq \mathrm{E}_\mathrm{lbc}(n,p) \geq 0$.

Sources:

Metadata: ID: P335 | shortcut: bin-ent | author: JoramSoch | date: 2022-09-02, 13:52.