Entropy of the multinomial distribution

Index: The Book of Statistical Proofs ▷ Probability Distributions ▷ Multivariate discrete distributions ▷ Multinomial distribution ▷ Shannon entropy

Theorem: Let $X$ be a random vector following a multinomial distribution:

\[\label{eq:mult} X \sim \mathrm{Mult}(n,p) \; .\]

Then, the (Shannon) entropy of $X$ is

\[\label{eq:mult-ent} \mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p)\]

where $\mathrm{H}_\mathrm{cat}(p)$ is the categorical entropy function, i.e. the (Shannon) entropy of the categorical distribution with category probabilities $p$

\[\label{eq:H-cat} \mathrm{H}_\mathrm{cat}(p) = - \sum_{i=1}^{k} p_i \cdot \log p_i\]

and $\mathrm{E}_\mathrm{lmc}(n,p)$ is the expected value of the logarithmized multinomial coefficient with superset size $n$

\[\label{eq:E-lmf} \mathrm{E}_\mathrm{lmf}(n,p) = \mathrm{E}\left[ \log {n \choose {X_1, \ldots, X_k}} \right] \quad \text{where} \quad X \sim \mathrm{Mult}(n,p) \; .\]

Proof: The entropy is defined as the probability-weighted average of the logarithmized probabilities for all possible values:

\[\label{eq:ent} \mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .\]

The probability mass function of the multinomial distribution is

\[\label{eq:mult-pmf} f_X(x) = {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i}\]

Let $\mathcal{X}_{n,k}$ be the set of all vectors $x \in \mathbb{N}^{1 \times k}$ satisfying $\sum_{i=1}^{k} x_i = n$. Then, we have:

\[\label{eq:mult-ent-s1} \begin{split} \mathrm{H}(X) &= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log f_X(x) \\ &= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log \left[ {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i} \right] \\ &= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \left[ \log {n \choose {x_1, \ldots, x_k}} + \sum_{i=1}^{k} x_i \cdot \log p_i \right] \; . \end{split}\]

Since the first factor in the sum corresponds to the probability mass of $X=x$, we can rewrite this as the sum of the expected values of the functions of the discrete random variable $x$ in the square bracket:

\[\label{eq:mult-ent-s2} \begin{split} \mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \left\langle \sum_{i=1}^{k} x_i \cdot \log p_i \right\rangle_{p(x)} \\ &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} \left\langle x_i \cdot \log p_i \right\rangle_{p(x)} \; . \end{split}\]

Using the expected value of the multinomial distribution, i.e. $X \sim \mathrm{Mult}(n,p) \Rightarrow \left\langle x_i \right\rangle = n p_i$, this gives:

\[\label{eq:mult-ent-s3} \begin{split} \mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} n p_i \cdot \log p_i \\ &= - \left\langle\log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - n \sum_{i=1}^{k} p_i \cdot \log p_i \; . \end{split}\]

Finally, we note that the first term is the negative expected value of the logarithm of a multinomial coefficient and that the second term is the entropy of the categorical distribution, such that we finally get:

\[\label{eq:mult-ent-s4} \mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p) \; .\]

∎

Sources:

original work

Metadata: ID: P337 | shortcut: mult-ent | author: JoramSoch | date: 2022-09-09, 16:33.