Proof: Probability density function of the normal-Wishart distribution
Index:
The Book of Statistical Proofs ▷
Probability Distributions ▷
Matrix-variate continuous distributions ▷
Normal-Wishart distribution ▷
Probability density function
Metadata: ID: P323 | shortcut: nw-pdf | author: JoramSoch | date: 2022-05-14, 23:58.
Theorem: Let $X$ and $Y$ follow a normal-Wishart distribution:
\[\label{eq:nw} X,Y \sim \mathrm{NW}(M, U, V, \nu) \; .\]Then, the joint probability density function of $X$ and $Y$ is
\[\label{eq:nw-pdf} \begin{split} p(X,Y) = \; & \frac{1}{\sqrt{(2\pi)^{np} |U|^p |V|^{\nu}}} \cdot \frac{\sqrt{2^{-\nu p}}}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot |Y|^{(\nu+n-p-1)/2} \cdot \\ & \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y \left[ (X-M)^\mathrm{T} \, U^{-1} (X-M) + V^{-1} \right] \right) \right] \; . \end{split}\]Proof: The normal-Wishart distribution is defined as $X$ conditional on $Y$ following a matrix-normal distribution and $Y$ following a Wishart distribution:
\[\label{eq:matn-wish} \begin{split} X \vert Y &\sim \mathcal{MN}(M, U, Y^{-1}) \\ Y &\sim \mathcal{W}(V, \nu) \; . \end{split}\]Thus, using the probability density function of the matrix-normal distribution and the probability density function of the Wishart distribution, we have the following probabilities:
\[\label{eq:matn-wish-pdf} \begin{split} p(X \vert Y) &= \mathcal{MN}(X; M, U, Y^{-1}) \\ &= \sqrt{\frac{|Y|^n}{(2\pi)^{np} |U|^p}} \cdot \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y (X-M)^\mathrm{T} \, U^{-1} (X-M) \right) \right] \\ p(Y) &= \mathcal{W}(Y; V, \nu) \\ &= \frac{1}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot \frac{1}{\sqrt{2^{\nu p} |V|^{\nu}}} \cdot |Y|^{(\nu-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V^{-1} Y \right) \right] \; . \end{split}\]The law of conditional probability implies that
\[\label{eq:prob-cond} p(X,Y) = p(X \vert Y) \, p(Y) \; ,\]such that the normal-Wishart density function becomes:
\[\label{eq:nw-pdf-qed} \begin{split} p(X,Y) = \; & \mathcal{MN}(X; M, U, Y^{-1}) \cdot \mathcal{W}(Y; V, \nu) \\ = \; & \sqrt{\frac{|Y|^n}{(2\pi)^{np} |U|^p}} \cdot \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y (X-M)^\mathrm{T} \, U^{-1} (X-M) \right) \right] \cdot \\ & \frac{1}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot \frac{1}{\sqrt{2^{\nu p} |V|^{\nu}}} \cdot |Y|^{(\nu-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V^{-1} Y \right) \right] \\ = \; & \frac{1}{\sqrt{(2\pi)^{np} |U|^p |V|^{\nu}}} \cdot \frac{\sqrt{2^{-\nu p}}}{\Gamma_p \left( \frac{\nu}{2} \right)} \cdot |Y|^{(\nu+n-p-1)/2} \cdot \\ & \exp\left[-\frac{1}{2} \mathrm{tr}\left( Y \left[ (X-M)^\mathrm{T} \, U^{-1} (X-M) + V^{-1} \right] \right) \right] \; . \end{split}\]∎
Sources: - Bishop, Christopher M. (2006): "Appendix B. Probability Distributions"; in: Pattern Recognition for Machine Learning, p. 690, eq. B.53; URL: http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf.
Metadata: ID: P323 | shortcut: nw-pdf | author: JoramSoch | date: 2022-05-14, 23:58.