Probability density function of the normal-Wishart distribution

Index: The Book of Statistical Proofs ▷ Probability Distributions ▷ Matrix-variate continuous distributions ▷ Normal-Wishart distribution ▷ Probability density function

Theorem: Let $X$ and $Y$ follow a normal-Wishart distribution:

\[\label{eq:nw} X,Y \sim \mathrm{NW}(M, U, V, \nu) \; .\]

Then, the joint probability density function of $X$ and $Y$ is

\[\label{eq:nw-pdf} \begin{split} p(X,Y) = \; & \frac{1}{\sqrt{(2\pi)^{np} |U|^p |V|^{\nu}}} \cdot \frac{\sqrt{2^{-\nu p}}}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot |Y|^{(\nu+n-p-1)/2} \cdot \\ & \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y \left[ (X-M)^\mathrm{T} \, U^{-1} (X-M) + V^{-1} \right] \right) \right] \; . \end{split}\]

Proof: The normal-Wishart distribution is defined as $X$ conditional on $Y$ following a matrix-normal distribution and $Y$ following a Wishart distribution:

\[\label{eq:matn-wish} \begin{split} X \vert Y &\sim \mathcal{MN}(M, U, Y^{-1}) \\ Y &\sim \mathcal{W}(V, \nu) \; . \end{split}\]

Thus, using the probability density function of the matrix-normal distribution and the probability density function of the Wishart distribution, we have the following probabilities:

\[\label{eq:matn-wish-pdf} \begin{split} p(X \vert Y) &= \mathcal{MN}(X; M, U, Y^{-1}) \\ &= \sqrt{\frac{|Y|^n}{(2\pi)^{np} |U|^p}} \cdot \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y (X-M)^\mathrm{T} \, U^{-1} (X-M) \right) \right] \\ p(Y) &= \mathcal{W}(Y; V, \nu) \\ &= \frac{1}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot \frac{1}{\sqrt{2^{\nu p} |V|^{\nu}}} \cdot |Y|^{(\nu-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V^{-1} Y \right) \right] \; . \end{split}\]

The law of conditional probability implies that

\[\label{eq:prob-cond} p(X,Y) = p(X \vert Y) \, p(Y) \; ,\]

such that the normal-Wishart density function becomes:

\[\label{eq:nw-pdf-qed} \begin{split} p(X,Y) = \; & \mathcal{MN}(X; M, U, Y^{-1}) \cdot \mathcal{W}(Y; V, \nu) \\ = \; & \sqrt{\frac{|Y|^n}{(2\pi)^{np} |U|^p}} \cdot \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y (X-M)^\mathrm{T} \, U^{-1} (X-M) \right) \right] \cdot \\ & \frac{1}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot \frac{1}{\sqrt{2^{\nu p} |V|^{\nu}}} \cdot |Y|^{(\nu-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V^{-1} Y \right) \right] \\ = \; & \frac{1}{\sqrt{(2\pi)^{np} |U|^p |V|^{\nu}}} \cdot \frac{\sqrt{2^{-\nu p}}}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot |Y|^{(\nu+n-p-1)/2} \cdot \\ & \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y \left[ (X-M)^\mathrm{T} \, U^{-1} (X-M) + V^{-1} \right] \right) \right] \; . \end{split}\]

∎

Sources:

Bishop, Christopher M. (2006): "Appendix B. Probability Distributions"; in: Pattern Recognition for Machine Learning, p. 690, eq. B.53; URL: http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf.

Metadata: ID: P323 | shortcut: nw-pdf | author: JoramSoch | date: 2022-05-14, 23:58.