Proof: Conjugate prior distribution for multinomial observations
Index:
The Book of Statistical Proofs ▷
Statistical Models ▷
Count data ▷
Multinomial observations ▷
Conjugate prior distribution
Metadata: ID: P79 | shortcut: mult-prior | author: JoramSoch | date: 2020-03-11, 14:15.
Theorem: Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a multinomial distribution:
\[\label{eq:Mult} y \sim \mathrm{Mult}(n,p) \; .\]Then, the conjugate prior for the model parameter $p$ is a Dirichlet distribution:
\[\label{eq:Dir} \mathrm{p}(p) = \mathrm{Dir}(p; \alpha_0) \; .\]Proof: With the probability mass function of the multinomial distribution, the likelihood function implied by \eqref{eq:Mult} is given by
\[\label{eq:Mult-LF} \mathrm{p}(y|p) = {n \choose {y_1, \ldots, y_k}} \prod_{j=1}^{k} {p_j}^{y_j} \; .\]In other words, the likelihood function is proportional to a product of powers of the entries of the vector $p$:
\[\label{eq:Mult-LF-prop} \mathrm{p}(y|p) \propto \prod_{j=1}^{k} {p_j}^{y_j} \; .\]The same is true for a Dirichlet distribution over $p$
\[\label{eq:Mult-prior-s1} \mathrm{p}(p) = \mathrm{Dir}(p; \alpha_0)\]the probability density function of which
\[\label{eq:Mult-prior-s2} \mathrm{p}(p) = \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\prod_{j=1}^k \Gamma(\alpha_{0j})} \prod_{j=1}^{k} {p_j}^{\alpha_{0j}-1}\]exhibits the same proportionality
\[\label{eq:Mult-prior-s3} \mathrm{p}(p) \propto \prod_{j=1}^{k} {p_j}^{\alpha_{0j}-1}\]and is therefore conjugate relative to the likelihood.
∎
Sources: - Wikipedia (2020): "Dirichlet distribution"; in: Wikipedia, the free encyclopedia, retrieved on 2020-03-11; URL: https://en.wikipedia.org/wiki/Dirichlet_distribution#Conjugate_to_categorical/multinomial.
Metadata: ID: P79 | shortcut: mult-prior | author: JoramSoch | date: 2020-03-11, 14:15.