# Marginal likelihood

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Marginal_likelihood
> Markdown URL: https://mediated.wiki/source/Marginal_likelihood.md
> Source: https://en.wikipedia.org/wiki/Marginal_likelihood
> Source revision: 1339863115
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

In Bayesian probability theory

Part of a series on Bayesian statistics Posterior = Likelihood × Prior ÷ Evidence Background Bayesian inference Bayesian probability Bayes' theorem Bernstein–von Mises theorem Coherence Cox's theorem Cromwell's rule Likelihood principle Principle of indifference Principle of maximum entropy Model building Conjugate prior Linear regression Empirical Bayes Hierarchical model Posterior approximation Markov chain Monte Carlo Laplace's approximation Integrated nested Laplace approximations Variational inference Approximate Bayesian computation Estimators Bayesian estimator Credible interval Maximum a posteriori estimation Evidence approximation Evidence lower bound Nested sampling Model evaluation Bayes factor (Schwarz criterion) Model averaging Posterior predictive Mathematics portal v t e

A **marginal likelihood** is a [likelihood function](/source/Likelihood_function) that has been [integrated](/source/Integral) over the [parameter space](/source/Parameter_space). In [Bayesian statistics](/source/Bayesian_statistics), it represents the probability of generating the [observed sample](/source/Sampling_(statistics)) for all possible values of the parameters; it can be understood as the probability of the model itself and is therefore often referred to as **model evidence** or simply **evidence**.

Due to the integration over the parameter space, the marginal likelihood does not directly depend upon the parameters. If the focus is not on model comparison, the marginal likelihood is simply the normalizing constant that ensures that the [posterior](/source/Posterior_probability) is a proper probability. It is related to the [partition function in statistical mechanics](/source/Partition_function_(statistical_mechanics)).[1]

## Concept

Given a set of [independent identically distributed](/source/Independent_identically_distributed) data points X = ( x 1 , … , x n ) , {\displaystyle \mathbf {X} =(x_{1},\ldots ,x_{n}),} where x i ∼ p ( x | θ ) {\displaystyle x_{i}\sim p(x|\theta )} according to some [probability distribution](/source/Probability_distribution) parameterized by θ {\displaystyle \theta } , where θ {\displaystyle \theta } itself is a [random variable](/source/Random_variable) described by a distribution, i.e. θ ∼ p ( θ ∣ α ) , {\displaystyle \theta \sim p(\theta \mid \alpha ),} the marginal likelihood in general asks what the probability p ( X ∣ α ) {\displaystyle p(\mathbf {X} \mid \alpha )} is, where θ {\displaystyle \theta } has been [marginalized out](/source/Marginal_distribution) (integrated out):

- p ( X ∣ α ) = ∫ θ p ( X ∣ θ ) p ( θ ∣ α ) d θ {\displaystyle p(\mathbf {X} \mid \alpha )=\int _{\theta }p(\mathbf {X} \mid \theta )\,p(\theta \mid \alpha )\ \operatorname {d} \!\theta }

The above definition is phrased in the context of [Bayesian statistics](/source/Bayesian_statistics) in which case p ( θ ∣ α ) {\displaystyle p(\theta \mid \alpha )} is called prior density and p ( X ∣ θ ) {\displaystyle p(\mathbf {X} \mid \theta )} is the likelihood. Recognizing that the marginal likelihood is the normalizing constant of the Bayesian posterior density p ( θ ∣ X , α ) {\displaystyle p(\theta \mid \mathbf {X} ,\alpha )} , one also has the alternative expression[2]

- p ( X ∣ α ) = p ( X ∣ θ , α ) p ( θ ∣ α ) p ( θ ∣ X , α ) {\displaystyle p(\mathbf {X} \mid \alpha )={\frac {p(\mathbf {X} \mid \theta ,\alpha )p(\theta \mid \alpha )}{p(\theta \mid \mathbf {X} ,\alpha )}}}

which is an identity in θ {\displaystyle \theta } . The marginal likelihood quantifies the agreement between data and prior in a geometric sense made precise via Hilbert spaces in de Carvalho et al. (2019). In classical ([frequentist](/source/Frequentist_statistics)) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter θ = ( ψ , λ ) {\displaystyle \theta =(\psi ,\lambda )} , where ψ {\displaystyle \psi } is the actual parameter of interest, and λ {\displaystyle \lambda } is a non-interesting [nuisance parameter](/source/Nuisance_parameter). If there exists a probability distribution for λ {\displaystyle \lambda } [*[dubious](https://en.wikipedia.org/wiki/Wikipedia:Accuracy_dispute#Disputed_statement) – [discuss](https://en.wikipedia.org/wiki/Talk:Marginal_likelihood#Frequentist_marginal_likelihood)*], it is often desirable to consider the likelihood function only in terms of ψ {\displaystyle \psi } , by marginalizing out λ {\displaystyle \lambda } :

- L ( ψ ; X ) = p ( X ∣ ψ ) = ∫ λ p ( X ∣ λ , ψ ) p ( λ ∣ ψ ) d λ {\displaystyle {\mathcal {L}}(\psi ;\mathbf {X} )=p(\mathbf {X} \mid \psi )=\int _{\lambda }p(\mathbf {X} \mid \lambda ,\psi )\,p(\lambda \mid \psi )\ \operatorname {d} \!\lambda }

Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the [conjugate prior](/source/Conjugate_prior) of the distribution of the data. In other cases, some kind of [numerical integration](/source/Numerical_integration) method is needed, either a general method such as [Gaussian integration](/source/Gaussian_integration) or a [Monte Carlo method](/source/Monte_Carlo_method), or a method specialized to statistical problems such as the [Laplace approximation](/source/Laplace_approximation), [Gibbs](/source/Gibbs_sampling)/[Metropolis](/source/Metropolis%E2%80%93Hastings_algorithm) sampling, or the [EM algorithm](/source/EM_algorithm).

It is also possible to apply the above considerations to a single random variable (data point) x {\displaystyle x} , rather than a set of observations. In a Bayesian context, this is equivalent to the [prior predictive distribution](/source/Prior_predictive_distribution) of a data point.

## Applications

### Bayesian model comparison

In [Bayesian model comparison](/source/Bayesian_model_comparison), the marginalized variables θ {\displaystyle \theta } are parameters for a particular type of model, and the remaining variable M {\displaystyle M} is the identity of the model itself. In this case, the marginalized likelihood is the probability of the data given the model type, not assuming any particular model parameters. Writing θ {\displaystyle \theta } for the model parameters, the marginal likelihood for the model *M* is

- p ( X ∣ M ) = ∫ p ( X ∣ θ , M ) p ( θ ∣ M ) d θ {\displaystyle p(\mathbf {X} \mid M)=\int p(\mathbf {X} \mid \theta ,M)\,p(\theta \mid M)\,\operatorname {d} \!\theta }

It is in this context that the term *model evidence* is normally used. This quantity is important because the posterior odds ratio for a model *M*1 against another model *M*2 involves a ratio of marginal likelihoods, called the [Bayes factor](/source/Bayes_factor):

- p ( M 1 ∣ X ) p ( M 2 ∣ X ) = p ( M 1 ) p ( M 2 ) p ( X ∣ M 1 ) p ( X ∣ M 2 ) {\displaystyle {\frac {p(M_{1}\mid \mathbf {X} )}{p(M_{2}\mid \mathbf {X} )}}={\frac {p(M_{1})}{p(M_{2})}}\,{\frac {p(\mathbf {X} \mid M_{1})}{p(\mathbf {X} \mid M_{2})}}}

which can be stated schematically as

- posterior [odds](/source/Odds) = prior odds × [Bayes factor](/source/Bayes_factor)

## See also

- [Empirical Bayes methods](/source/Empirical_Bayes_methods)

- [Lindley's paradox](/source/Lindley's_paradox)

- [Marginal probability](/source/Marginal_probability)

- [Bayesian information criterion](/source/Bayesian_information_criterion)

This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. Please help improve this article by introducing more precise citations. (July 2010) (Learn how and when to remove this message)

## References

1. **[^](#cite_ref-1)** Šmídl, Václav; Quinn, Anthony (2006). "Bayesian Theory". *The Variational Bayes Method in Signal Processing*. Springer. pp. 13–23. [doi](/source/Doi_(identifier)):[10.1007/3-540-28820-1_2](https://doi.org/10.1007%2F3-540-28820-1_2).

1. **[^](#cite_ref-2)** Chib, Siddhartha (1995). "Marginal likelihood from the Gibbs output". *Journal of the American Statistical Association*. **90** (432): 1313–1321. [doi](/source/Doi_(identifier)):[10.1080/01621459.1995.10476635](https://doi.org/10.1080%2F01621459.1995.10476635).

## Further reading

- Charles S. Bos. "A comparison of marginal likelihood computation methods". In W. Härdle and B. Ronz, editors, *COMPSTAT 2002: Proceedings in Computational Statistics*, pp. 111–117. 2002. *(Available as a preprint on [SSRN](/source/SSRN_(identifier)) [332860](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=332860))*

- de Carvalho, Miguel; Page, Garritt; Barney, Bradley (2019). "On the geometry of Bayesian inference". *Bayesian Analysis*. 14 (4): 1013‒1036. *(Available as a preprint on the web: [\[1\]](https://www.maths.ed.ac.uk/~mdecarv/papers/decarvalho2018.pdf))*

- Lambert, Ben (2018). "The devil is in the denominator". *A Student's Guide to Bayesian Statistics*. Sage. pp. 109–120. [ISBN](/source/ISBN_(identifier)) [978-1-4739-1636-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4739-1636-4).

- [The on-line textbook: Information Theory, Inference, and Learning Algorithms](http://www.inference.phy.cam.ac.uk/mackay/itila/), by [David J.C. MacKay](/source/David_J.C._MacKay).

---
Adapted from the Wikipedia article [Marginal likelihood](https://en.wikipedia.org/wiki/Marginal_likelihood) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Marginal_likelihood?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
