Theorem: In a full probability model $m$ describing measured data $y$ using model parameters $\theta$, the marginal likelihood is the integral of the joint likelihood across the parameter space $\Theta$

$\label{eq:ml-jl} p(y|m) = \int_{\Theta} p(y,\theta|m) \, \mathrm{d}\theta$

and related to likelihood function and prior distribution as follows:

$\label{eq:ml-lf} p(y|m) = \int_{\Theta} p(y|\theta,m) \, p(\theta|m) \, \mathrm{d}\theta \; .$

Proof: In a full probability model, the marginal likelihood is defined as the marginal probability of the data $y$, given only the model $m$:

$\label{eq:ml-def} p(y|m) \; .$

Using the law of marginal probabililty, this can be obtained by integrating the joint likelihood function over the entire parameter space:

$\label{eq:ml-jl-qed} p(y|m) = \int_{\Theta} p(y,\theta|m) \, \mathrm{d}\theta \; .$

Applying the law of conditional probability, the integrand can also be written as the product of likelihood function and prior density:

$\label{eq:ml-lf-qed} p(y|m) = \int_{\Theta} p(y|\theta,m) \, p(\theta|m) \, \mathrm{d}\theta \; .$
