Conjugate prior distribution for multinomial observations

Index: The Book of Statistical Proofs ▷ Statistical Models ▷ Count data ▷ Multinomial observations ▷ Conjugate prior distribution

Theorem: Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a multinomial distribution:

\[\label{eq:Mult} y \sim \mathrm{Mult}(n,p) \; .\]

Then, the conjugate prior for the model parameter $p$ is a Dirichlet distribution:

\[\label{eq:Dir} \mathrm{p}(p) = \mathrm{Dir}(p; \alpha_0) \; .\]

Proof: With the probability mass function of the multinomial distribution, the likelihood function implied by \eqref{eq:Mult} is given by

\[\label{eq:Mult-LF} \mathrm{p}(y|p) = {n \choose {y_1, \ldots, y_k}} \prod_{j=1}^{k} {p_j}^{y_j} \; .\]

In other words, the likelihood function is proportional to a product of powers of the entries of the vector $p$:

\[\label{eq:Mult-LF-prop} \mathrm{p}(y|p) \propto \prod_{j=1}^{k} {p_j}^{y_j} \; .\]

The same is true for a Dirichlet distribution over $p$

\[\label{eq:Mult-prior-s1} \mathrm{p}(p) = \mathrm{Dir}(p; \alpha_0)\]

the probability density function of which

\[\label{eq:Mult-prior-s2} \mathrm{p}(p) = \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\prod_{j=1}^k \Gamma(\alpha_{0j})} \prod_{j=1}^{k} {p_j}^{\alpha_{0j}-1}\]

exhibits the same proportionality

\[\label{eq:Mult-prior-s3} \mathrm{p}(p) \propto \prod_{j=1}^{k} {p_j}^{\alpha_{0j}-1}\]

and is therefore conjugate relative to the likelihood.

∎

Sources:

Wikipedia (2020): "Dirichlet distribution"; in: Wikipedia, the free encyclopedia, retrieved on 2020-03-11; URL: https://en.wikipedia.org/wiki/Dirichlet_distribution#Conjugate_to_categorical/multinomial.

Metadata: ID: P79 | shortcut: mult-prior | author: JoramSoch | date: 2020-03-11, 14:15.