Covariance matrix of the multinomial distribution

Index: The Book of Statistical Proofs ▷ Probability Distributions ▷ Multivariate discrete distributions ▷ Multinomial distribution ▷ Covariance

Theorem: Let $X$ be a random vector following a multinomial distribution:

\[\label{eq:mult} \left[X_1, \ldots, X_k \right] = X \sim \mathrm{Mult}(n, p), \; n \in \mathbb{N}, \; p = \left[p_1, \ldots, p_k \right]^\mathrm{T} \; .\]

Then, the covariance matrix of $X$ is

\[\label{eq:mult-cov} \mathrm{Cov}(X) = n \left(\mathrm{diag}(p) - pp^\mathrm{T} \right) \; .\]

Proof: We first observe that the sample space of each coordinate $X_i$ is $\left\lbrace 0, 1, \ldots, n \right\rbrace$ and $X_i$ is the sum of independent draws of category $i$, which is drawn with probability $p_i$. Thus each coordinate follows a binomial distribution:

\[\label{eq:Marginal} X_i \stackrel{\mathrm{i.i.d.}}{\sim} \mathrm{Bin}(n, p_i), \; i = 1,\ldots, k \; ,\]

which has the variance $\mathrm{Var}(X_i) = n p_i(1-p_i) = n (p_i - p_i^2)$, constituting the elements of the main diagonal in $\mathrm{Cov}(X)$ in \eqref{eq:mult-cov}. To prove $\mathrm{Cov}(X_i, X_j) = -n p_i p_j$ for $i \ne j$ (which constitutes the off-diagonal elements of the covariance matrix), we first recognize that

\[\label{eq:bin-sum} X_i = \sum_{k=1}^n \mathbb{I}_i(k), \quad \text{with} \quad \mathbb{I}_i(k) = \begin{cases} 1 & \text{if $k$-th draw was of category $i$}, \\ 0 & \text{otherwise} \; , \end{cases}\]

where the indicator function $\mathbb{I}_i$ is a Bernoulli-distributed random variable with the expected value $p_i$. Then, we have

\[\label{eq:mult-cov-qed} \begin{split} \mathrm{Cov}(X_i, X_j) &= \mathrm{Cov}\left(\sum_{k=1}^n \mathbb{I}_i(k), \sum_{l=1}^n \mathbb{I}_j(l)\right) \\ &= \sum_{k=1}^n\sum_{l=1}^n \mathrm{Cov}\left(\mathbb{I}_i(k), \mathbb{I}_j(l)\right) \\ &= \sum_{k=1}^n \left[ \mathrm{Cov}\left(\mathbb{I}_i(k), \mathbb{I}_j(k)\right) + \sum_{\substack{l=1 \\ l \ne k}}^n \underbrace{\mathrm{Cov}\left(\mathbb{I}_i(k), \mathbb{I}_j(l)\right)}_{=0} \right] \\ & \stackrel{i \ne j}{=} \;\; \sum_{k=1}^n \left(\mathrm{E}\Big( \underbrace{\mathbb{I}_i(k) \,\mathbb{I}_j(k)}_{=0} \Big) - \mathrm{E}\big(\mathbb{I}_i(k)\big) \mathrm{E}\big(\mathbb{I}_j(k)\big) \right) \\ &= -\sum_{k=1}^n \mathrm{E}\big(\mathbb{I}_i(k)\big) \mathrm{E}\big(\mathbb{I}_j(k)\big) \\ &= -n p_i p_j \; , \end{split}\]

as desired.

∎

Sources:

Tutz G (2012): "Multinomial Response Models"; in: Regression for Categorical Data, pp. 209ff.; URL: https://www.cambridge.org/core/books/regression-for-categorical-data/B71F71F2A484E2DF88256C8DF004108C; DOI: 10.1017/CBO9780511842061.

Metadata: ID: P322 | shortcut: mult-cov | author: adkipnis | date: 2022-05-11, 16:40.