# Compound probability distribution

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Compound_probability_distribution
> Markdown URL: https://mediated.wiki/source/Compound_probability_distribution.md
> Source: https://en.wikipedia.org/wiki/Compound_probability_distribution
> Source revision: 1341668908
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Concept in statistics

In [probability](/source/Probability) and [statistics](/source/Statistics), a **compound probability distribution** (also known as a **[mixture distribution](/source/Mixture_distribution)** or **contagious distribution**) is the [probability distribution](/source/Probability_distribution) that results from assuming that a [random variable](/source/Random_variable) is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a [scale parameter](/source/Scale_parameter), the resulting mixture is also called a **scale mixture**.

The compound distribution ("unconditional distribution") is the result of [marginalizing](/source/Marginal_distribution) (integrating) over the *latent* random variable(s) representing the parameter(s) of the parametrized distribution ("conditional distribution").

## Definition

A **compound probability distribution** is the probability distribution that results from assuming that a random variable X {\displaystyle X} is distributed according to some parametrized distribution F {\displaystyle F} with an unknown parameter θ {\displaystyle \theta } that is again distributed according to some other distribution G {\displaystyle G} . The resulting distribution H {\displaystyle H} is said to be the distribution that results from compounding F {\displaystyle F} with G {\displaystyle G} . The parameter's distribution G {\displaystyle G} is also called the **mixing distribution** or **latent distribution**. Technically, the *unconditional* distribution H {\displaystyle H} results from *[marginalizing](/source/Marginal_distribution)* over G {\displaystyle G} , i.e., from integrating out the unknown parameter(s) θ {\displaystyle \theta } . Its [probability density function](/source/Probability_density_function) is given by:

- p H ( x ) = ∫ p F ( x | θ ) p G ( θ ) d θ {\displaystyle p_{H}(x)={\displaystyle \int \limits p_{F}(x|\theta )\,p_{G}(\theta )\operatorname {d} \!\theta }}

The same formula applies analogously if some or all of the variables are vectors.

From the above formula, one can see that a compound distribution essentially is a special case of a [marginal distribution](/source/Marginal_distribution): The *[joint distribution](/source/Joint_probability_distribution)* of x {\displaystyle x} and θ {\displaystyle \theta } is given by p ( x , θ ) = p ( x | θ ) p ( θ ) {\displaystyle p(x,\theta )=p(x|\theta )p(\theta )} , and the compound results as its marginal distribution: p ( x ) = ∫ p ( x , θ ) d θ {\displaystyle {\textstyle p(x)=\int p(x,\theta )\operatorname {d} \!\theta }} . If the domain of θ {\displaystyle \theta } is discrete, then the distribution is again a special case of a [mixture distribution](/source/Mixture_distribution).

## Properties

### General

The compound distribution H {\displaystyle H} will depend on the specific expression of each distribution, as well as which parameter of F {\displaystyle F} is distributed according to the distribution G {\displaystyle G} , and the parameters of H {\displaystyle H} will include any parameters of G {\displaystyle G} that are not marginalized, or integrated, out. The [support](/source/Support_(mathematics)) of H {\displaystyle H} is the same as that of F {\displaystyle F} , and if the latter is a two-parameter distribution parameterized with the mean and variance, some general properties exist.

### Mean and variance

The compound distribution's first two [moments](/source/Moment_(mathematics)) are given by the [law of total expectation](/source/Law_of_total_expectation) and the [law of total variance](/source/Law_of_total_variance):

E H ⁡ [ X ] = E G ⁡ [ E F ⁡ [ X | θ ] ] {\displaystyle \operatorname {E} _{H}[X]=\operatorname {E} _{G}{\bigl [}\operatorname {E} _{F}[X|\theta ]{\bigr ]}}

Var H ⁡ ( X ) = E G ⁡ [ Var F ⁡ ( X | θ ) ] + Var G ⁡ ( E F ⁡ [ X | θ ] ) {\displaystyle \operatorname {Var} _{H}(X)=\operatorname {E} _{G}{\bigl [}\operatorname {Var} _{F}(X|\theta ){\bigr ]}+\operatorname {Var} _{G}{\bigl (}\operatorname {E} _{F}[X|\theta ]{\bigr )}}

If the mean of F {\displaystyle F} is distributed as G {\displaystyle G} , which in turn has mean μ {\displaystyle \mu } and variance σ 2 {\displaystyle \sigma ^{2}} the expressions above imply E H ⁡ [ X ] = E G ⁡ [ θ ] = μ {\displaystyle \operatorname {E} _{H}[X]=\operatorname {E} _{G}[\theta ]=\mu } and Var H ⁡ ( X ) = Var F ⁡ ( X | θ ) + Var G ⁡ ( Y ) = τ 2 + σ 2 {\displaystyle \operatorname {Var} _{H}(X)=\operatorname {Var} _{F}(X|\theta )+\operatorname {Var} _{G}(Y)=\tau ^{2}+\sigma ^{2}} , where τ 2 {\displaystyle \tau ^{2}} is the variance of F {\displaystyle F} .

### Proof

let F {\displaystyle F} and G {\displaystyle G} be probability distributions parameterized with mean and variance as x ∼ F ( θ , τ 2 ) θ ∼ G ( μ , σ 2 ) {\displaystyle {\begin{aligned}x&\sim {\mathcal {F}}(\theta ,\tau ^{2})\\\theta &\sim {\mathcal {G}}(\mu ,\sigma ^{2})\end{aligned}}} then denoting the probability density functions as f ( x | θ ) = p F ( x | θ ) {\displaystyle f(x|\theta )=p_{F}(x|\theta )} and g ( θ ) = p G ( θ ) {\displaystyle g(\theta )=p_{G}(\theta )} respectively, and h ( x ) {\displaystyle h(x)} being the probability density of H {\displaystyle H} we have E H ⁡ [ X ] = ∫ F x h ( x ) d x = ∫ F x ∫ G f ( x | θ ) g ( θ ) d θ d x = ∫ G ∫ F x f ( x | θ ) d x g ( θ ) d θ = ∫ G E F ⁡ [ X | θ ] g ( θ ) d θ {\displaystyle {\begin{aligned}\operatorname {E} _{H}[X]=\int _{F}xh(x)dx&=\int _{F}x\int _{G}f(x|\theta )g(\theta )d\theta dx\\&=\int _{G}\int _{F}xf(x|\theta )dx\ g(\theta )d\theta \\&=\int _{G}\operatorname {E} _{F}[X|\theta ]g(\theta )d\theta \end{aligned}}} and we have from the parameterization F {\displaystyle {\mathcal {F}}} and G {\displaystyle {\mathcal {G}}} that E F ⁡ [ X | θ ] = ∫ F x f ( x | θ ) d x = θ E G ⁡ [ θ ] = ∫ G θ g ( θ ) d θ = μ {\displaystyle {\begin{aligned}\operatorname {E} _{F}[X|\theta ]&=\int _{F}xf(x|\theta )dx=\theta \\\operatorname {E} _{G}[\theta ]&=\int _{G}\theta g(\theta )d\theta =\mu \end{aligned}}} and therefore the mean of the compound distribution E H ⁡ [ X ] = μ {\displaystyle \operatorname {E} _{H}[X]=\mu } as per the expression for its first moment above.

The variance of H {\displaystyle H} is given by E H ⁡ [ X 2 ] − ( E H ⁡ [ X ] ) 2 {\displaystyle \operatorname {E} _{H}[X^{2}]-(\operatorname {E} _{H}[X])^{2}} , and E H ⁡ [ X 2 ] = ∫ F x 2 h ( x ) d x = ∫ F x 2 ∫ G f ( x | θ ) g ( θ ) d θ d x = ∫ G g ( θ ) ∫ F x 2 f ( x | θ ) d x d θ = ∫ G g ( θ ) ( τ 2 + θ 2 ) d θ = τ 2 ∫ G g ( θ ) d θ + ∫ G g ( θ ) θ 2 d θ = τ 2 + ( σ 2 + μ 2 ) , {\displaystyle {\begin{aligned}\operatorname {E} _{H}[X^{2}]=\int _{F}x^{2}h(x)dx&=\int _{F}x^{2}\int _{G}f(x|\theta )g(\theta )d\theta dx\\&=\int _{G}g(\theta )\int _{F}x^{2}f(x|\theta )dx\ d\theta \\&=\int _{G}g(\theta )(\tau ^{2}+\theta ^{2})d\theta \\&=\tau ^{2}\int _{G}g(\theta )d\theta +\int _{G}g(\theta )\theta ^{2}d\theta \\&=\tau ^{2}+(\sigma ^{2}+\mu ^{2}),\end{aligned}}} given the fact that ∫ F x 2 f ( x ∣ θ ) d x = E F ⁡ [ X 2 ∣ θ ] = Var F ⁡ ( X ∣ θ ) + ( E F ⁡ [ X ∣ θ ] ) 2 {\displaystyle \int _{F}x^{2}f(x\mid \theta )dx=\operatorname {E} _{F}[X^{2}\mid \theta ]=\operatorname {Var} _{F}(X\mid \theta )+(\operatorname {E} _{F}[X\mid \theta ])^{2}} and ∫ G θ 2 g ( θ ) d θ = E G ⁡ [ θ 2 ] = Var G ⁡ ( θ ) + ( E G ⁡ [ θ ] ) 2 {\displaystyle \int _{G}\theta ^{2}g(\theta )d\theta =\operatorname {E} _{G}[\theta ^{2}]=\operatorname {Var} _{G}(\theta )+(\operatorname {E} _{G}[\theta ])^{2}} . Finally we get Var H ⁡ ( X ) = E H ⁡ [ X 2 ] − ( E H ⁡ [ X ] ) 2 = τ 2 + σ 2 {\displaystyle {\begin{aligned}\operatorname {Var} _{H}(X)&=\operatorname {E} _{H}[X^{2}]-(\operatorname {E} _{H}[X])^{2}\\&=\tau ^{2}+\sigma ^{2}\end{aligned}}}

## Applications

### Testing

Distributions of common [test statistics](/source/Test_statistic) result as compound distributions under their null hypothesis, for example in [Student's t-test](/source/Student's_t-test) (where the test statistic results as the ratio of a [normal](/source/Normal_distribution) and a [chi-squared](/source/Chi-squared_distribution) random variable), or in the [F-test](/source/F-test) (where the test statistic is the ratio of two [chi-squared](/source/Chi-squared_distribution) random variables).

### Overdispersion modeling

Compound distributions are useful for modeling outcomes exhibiting [overdispersion](/source/Overdispersion), i.e., a greater amount of variability than would be expected under a certain model. For example, count data are commonly modeled using the [Poisson distribution](/source/Poisson_distribution), whose variance is equal to its mean. The distribution may be generalized by allowing for variability in its [rate parameter](/source/Rate_parameter), implemented via a [gamma distribution](/source/Gamma_distribution), which results in a marginal [negative binomial distribution](/source/Negative_binomial_distribution). This distribution is similar in its shape to the Poisson distribution, but it allows for larger variances. Similarly, a [binomial distribution](/source/Binomial_distribution) may be generalized to allow for additional variability by compounding it with a [beta distribution](/source/Beta_distribution) for its success probability parameter, which results in a [beta-binomial distribution](/source/Beta-binomial_distribution).

### Bayesian inference

Besides ubiquitous marginal distributions that may be seen as special cases of compound distributions, in [Bayesian inference](/source/Bayesian_inference), compound distributions arise when, in the notation above, *F* represents the distribution of future observations and *G* is the [posterior distribution](/source/Posterior_distribution) of the parameters of *F*, given the information in a set of observed data. This gives a [posterior predictive distribution](/source/Posterior_predictive_distribution). Correspondingly, for the [prior predictive distribution](/source/Prior_predictive_distribution), *F* is the distribution of a new data point while *G* is the [prior distribution](/source/Prior_distribution) of the parameters.

### Convolution

[Convolution](/source/Convolution) of probability distributions (to derive the probability distribution of sums of random variables) may also be seen as a special case of compounding; here the sum's distribution essentially results from considering one summand as a random [location parameter](/source/Location_parameter) for the other summand.[1]

## Computation

Compound distributions derived from [exponential family](/source/Exponential_family) distributions often have a closed form. If analytical integration is not possible, numerical methods may be necessary.

Compound distributions may relatively easily be investigated using [Monte Carlo methods](/source/Monte_Carlo_method), i.e., by generating random samples. It is often easy to generate random numbers from the distributions p ( θ ) {\displaystyle p(\theta )} as well as p ( x | θ ) {\displaystyle p(x|\theta )} and then utilize these to perform *[collapsed Gibbs sampling](/source/Collapsed_Gibbs_sampling)* to generate samples from p ( x ) {\displaystyle p(x)} .

A compound distribution may usually also be approximated to a sufficient degree by a [mixture distribution](/source/Mixture_distribution) using a finite number of mixture components, allowing to derive approximate density, distribution function etc.[1]

[Parameter estimation](/source/Estimation_theory) ([maximum-likelihood](/source/Maximum-likelihood_estimation) or [maximum-a-posteriori](/source/Maximum_a_posteriori_estimation) estimation) within a compound distribution model may sometimes be simplified by utilizing the [EM-algorithm](/source/EM-algorithm).[2]

## Examples

- **Gaussian scale mixtures**:[3][4] - Compounding a [normal distribution](/source/Normal_distribution) with [variance](/source/Variance) distributed according to an [inverse gamma distribution](/source/Inverse_gamma_distribution) (or equivalently, with [precision](/source/Precision_(statistics)) distributed as a [gamma distribution](/source/Gamma_distribution)) yields a non-standardized **[Student's t-distribution](/source/Student's_t-distribution)**.[5] This distribution has the same symmetrical shape as a normal distribution with the same central point, but has greater variance and [heavy tails](/source/Heavy_tail). - Compounding a [Gaussian (or normal) distribution](/source/Gaussian_distribution) with variance distributed according to an [exponential distribution](/source/Exponential_distribution) (or with standard deviation according to a [Rayleigh distribution](/source/Rayleigh_distribution)) yields a **[Laplace distribution](/source/Laplace_distribution)**. More generally, compounding a Gaussian (or normal) distribution with variance distributed according to a [gamma distribution](/source/Gamma_distribution) yields a **[variance-gamma distribution](/source/Variance-gamma_distribution)**. - Compounding a [Gaussian distribution](/source/Gaussian_distribution) with variance distributed according to an [exponential distribution](/source/Exponential_distribution) whose rate parameter is itself distributed according to a [gamma distribution](/source/Gamma_distribution) yields a **[Normal-exponential-gamma distribution](/source/Normal-exponential-gamma_distribution)**. (This involves two compounding stages. The variance itself then follows a [Lomax distribution](/source/Lomax_distribution); see below.) - Compounding a [Gaussian distribution](/source/Gaussian_distribution) with standard deviation distributed according to a [(standard) inverse uniform distribution](/source/Inverse_distribution#Inverse_uniform_distribution) yields a **[Slash distribution](/source/Slash_distribution)**. - Compounding a [Gaussian (normal) distribution](/source/Gaussian_distribution) with a [Kolmogorov distribution](/source/Kolmogorov%E2%80%93Smirnov_test#Kolmogorov_distribution) yields a **[logistic distribution](/source/Logistic_distribution)**.[6][3]

- **other Gaussian mixtures**: - Compounding a [Gaussian distribution](/source/Gaussian_distribution) with [mean](/source/Mean) distributed according to another [Gaussian distribution](/source/Gaussian_distribution) yields (again) a **[Gaussian distribution](/source/Gaussian_distribution)**. - Compounding a [Gaussian distribution](/source/Gaussian_distribution) with [mean](/source/Mean) distributed according to a shifted [exponential distribution](/source/Exponential_distribution) yields an **[exponentially modified Gaussian distribution](/source/Exponentially_modified_Gaussian_distribution)**.

- Compounding a [Bernoulli distribution](/source/Bernoulli_distribution) with probability of success p {\displaystyle p} distributed according to a distribution X {\displaystyle X} that has a defined expected value yields a Bernoulli distribution with success probability E [ X ] {\displaystyle E[X]} . An interesting consequence is that the dispersion of X {\displaystyle X} does not influence the dispersion of the resulting compound distribution.

- Compounding a [binomial distribution](/source/Binomial_distribution) with probability of success distributed according to a [beta distribution](/source/Beta_distribution) yields a **[beta-binomial distribution](/source/Beta-binomial_distribution)**. It possesses three parameters, a parameter n {\displaystyle n} (number of samples) from the binomial distribution and [shape parameters](/source/Shape_parameter) α {\displaystyle \alpha } and β {\displaystyle \beta } from the beta distribution.[7][8]

- Compounding a [multinomial distribution](/source/Multinomial_distribution) with probability vector distributed according to a [Dirichlet distribution](/source/Dirichlet_distribution) yields a **[Dirichlet-multinomial distribution](/source/Dirichlet-multinomial_distribution)**.

- Compounding a [Poisson distribution](/source/Poisson_distribution) with [rate parameter](/source/Rate_parameter) distributed according to a [gamma distribution](/source/Gamma_distribution) yields a **[negative binomial distribution](/source/Negative_binomial_distribution)**.[9][10]

- Compounding a [Poisson distribution](/source/Poisson_distribution) with rate parameter distributed according to an [exponential distribution](/source/Exponential_distribution) yields a **[geometric distribution](/source/Geometric_distribution)**.

- Compounding an [exponential distribution](/source/Exponential_distribution) with its [rate parameter](/source/Rate_parameter) distributed according to a [gamma distribution](/source/Gamma_distribution) yields a **[Lomax distribution](/source/Lomax_distribution)**.[11]

- Compounding a [gamma distribution](/source/Gamma_distribution) with [inverse scale parameter](/source/Rate_parameter) distributed according to another [gamma distribution](/source/Gamma_distribution) yields a three-parameter **[beta prime distribution](/source/Beta_prime_distribution#Compound_gamma_distribution)**.[12]

- Compounding a [half-normal distribution](/source/Half-normal_distribution) with its [scale parameter](/source/Scale_parameter) distributed according to a [Rayleigh distribution](/source/Rayleigh_distribution) yields an **[exponential distribution](/source/Exponential_distribution)**. This follows immediately from the [Laplace distribution](/source/Laplace_distribution) resulting as a [normal](/source/Normal_distribution) scale mixture; see above. The roles of conditional and mixing distributions may also be exchanged here; consequently, compounding a [Rayleigh distribution](/source/Rayleigh_distribution) with its scale parameter distributed according to a [half-normal distribution](/source/Half-normal_distribution) *also* yields an [exponential distribution](/source/Exponential_distribution).

- A [Gamma(k=2,θ) - distributed](/source/Gamma_distribution) random variable whose [scale parameter](/source/Scale_parameter) θ again is [uniformly](/source/Uniform_distribution_(continuous)) distributed marginally yields an **[exponential distribution](/source/Exponential_distribution)**.

## Similar terms

The notion of "compound distribution" as used e.g. in the definition of a [Compound Poisson distribution](/source/Compound_Poisson_distribution) or [Compound Poisson process](/source/Compound_Poisson_process) is different from the definition found in this article. The meaning in this article corresponds to what is used in e.g. [Bayesian hierarchical modeling](/source/Bayesian_hierarchical_modeling).

The special case for compound probability distributions where the parametrized distribution F {\displaystyle F} is the [Poisson distribution](/source/Poisson_distribution) is also called [mixed Poisson distribution](/source/Mixed_Poisson_distribution).

## See also

- [Mixture distribution](/source/Mixture_distribution)

- [Mixed Poisson distribution](/source/Mixed_Poisson_distribution)

- [Bayesian hierarchical modeling](/source/Bayesian_hierarchical_modeling)

- [Marginal distribution](/source/Marginal_distribution)

- [Conditional distribution](/source/Conditional_probability_distribution)

- [Joint distribution](/source/Joint_probability_distribution)

- [Convolution](/source/Convolution)

- [Overdispersion](/source/Overdispersion)

- [EM-algorithm](/source/EM-algorithm)

- [Giry monad](/source/Giry_monad)

## References

1. ^ [***a***](#cite_ref-RoeverFriede2017_1-0) [***b***](#cite_ref-RoeverFriede2017_1-1) Röver, C.; Friede, T. (2017). ["Discrete approximation of a mixture distribution via restricted divergence"](https://doi.org/10.1080%2F10618600.2016.1276840). *Journal of Computational and Graphical Statistics*. **26** (1): 217–222. [arXiv](/source/ArXiv_(identifier)):[1602.04060](https://arxiv.org/abs/1602.04060). [doi](/source/Doi_(identifier)):[10.1080/10618600.2016.1276840](https://doi.org/10.1080%2F10618600.2016.1276840).

1. **[^](#cite_ref-2)** Gelman, A.; Carlin, J. B.; Stern, H.; Rubin, D. B. (1997). "9.5 *Finding marginal posterior modes using EM and related algorithms*". *Bayesian Data Analysis* (1st ed.). Boca Raton: Chapman & Hall / CRC. p. 276.

1. ^ [***a***](#cite_ref-LeeMcLachlan2019_3-0) [***b***](#cite_ref-LeeMcLachlan2019_3-1) Lee, S.X.; McLachlan, G.J. (2019). "Scale Mixture Distribution". *Wiley StatsRef: Statistics Reference Online*. pp. 1–16. [doi](/source/Doi_(identifier)):[10.1002/9781118445112.stat08201](https://doi.org/10.1002%2F9781118445112.stat08201). [ISBN](/source/ISBN_(identifier)) [978-1-118-44511-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-118-44511-2).

1. **[^](#cite_ref-Gneiting1997_4-0)** Gneiting, T. (1997). "Normal scale mixtures and dual probability densities". *Journal of Statistical Computation and Simulation*. **59** (4): 375–384. [doi](/source/Doi_(identifier)):[10.1080/00949659708811867](https://doi.org/10.1080%2F00949659708811867).

1. **[^](#cite_ref-5)** Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). *Introduction to the theory of statistics* (3rd ed.). New York: McGraw-Hill.

1. **[^](#cite_ref-6)** Andrews, D.F.; Mallows, C.L. (1974), "Scale mixtures of normal distributions", *Journal of the Royal Statistical Society, Series B*, **36** (1): 99–102, [doi](/source/Doi_(identifier)):[10.1111/j.2517-6161.1974.tb00989.x](https://doi.org/10.1111%2Fj.2517-6161.1974.tb00989.x)

1. **[^](#cite_ref-7)** Johnson, N. L.; [Kemp, A. W.](/source/Adrienne_W._Kemp); Kotz, S. (2005). "6.2.2". *Univariate discrete distributions* (3rd ed.). New York: Wiley. p. 253.

1. **[^](#cite_ref-8)** Gelman, A.; Carlin, J. B.; Stern, H.; Dunson, D. B.; Vehtari, A.; Rubin, D. B. (2014). *Bayesian Data Analysis* (3rd ed.). Boca Raton: Chapman & Hall / CRC. [Bibcode](/source/Bibcode_(identifier)):[2014bda..book.....G](https://ui.adsabs.harvard.edu/abs/2014bda..book.....G).

1. **[^](#cite_ref-9)** Lawless, J.F. (1987). "Negative binomial and mixed Poisson regression". *The Canadian Journal of Statistics*. **15** (3): 209–225. [doi](/source/Doi_(identifier)):[10.2307/3314912](https://doi.org/10.2307%2F3314912). [JSTOR](/source/JSTOR_(identifier)) [3314912](https://www.jstor.org/stable/3314912).

1. **[^](#cite_ref-10)** Teich, M. C.; Diament, P. (1989). "Multiply stochastic representations for K distributions and their Poisson transforms". *Journal of the Optical Society of America A*. **6** (1): 80–91. [Bibcode](/source/Bibcode_(identifier)):[1989JOSAA...6...80T](https://ui.adsabs.harvard.edu/abs/1989JOSAA...6...80T). [CiteSeerX](/source/CiteSeerX_(identifier)) [10.1.1.64.596](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.596). [doi](/source/Doi_(identifier)):[10.1364/JOSAA.6.000080](https://doi.org/10.1364%2FJOSAA.6.000080).

1. **[^](#cite_ref-11)** Johnson, N. L.; Kotz, S.; Balakrishnan, N. (1994). "20 *Pareto distributions*". *Continuous univariate distributions*. Vol. 1 (2nd ed.). New York: Wiley. p. 573.

1. **[^](#cite_ref-12)** Dubey, S. D. (1970). "Compound gamma, beta and F distributions". *Metrika*. **16**: 27–31. [doi](/source/Doi_(identifier)):[10.1007/BF02613934](https://doi.org/10.1007%2FBF02613934).

## Further reading

- Lindsay, B. G. (1995), *Mixture models: theory, geometry and applications*, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5, Hayward, CA, USA: Institute of Mathematical Statistics, pp. i–163, [ISBN](/source/ISBN_(identifier)) [978-0-940600-32-4](https://en.wikipedia.org/wiki/Special:BookSources/978-0-940600-32-4), [JSTOR](/source/JSTOR_(identifier)) [4153184](https://www.jstor.org/stable/4153184)

- Seidel, W. (2010), "Mixture models", in Lovric, M. (ed.), *International Encyclopedia of Statistical Science*, Heidelberg: Springer, pp. 827–829, [doi](/source/Doi_(identifier)):[10.1007/978-3-642-04898-2_368](https://doi.org/10.1007%2F978-3-642-04898-2_368), [ISBN](/source/ISBN_(identifier)) [978-3-642-04898-2](https://en.wikipedia.org/wiki/Special:BookSources/978-3-642-04898-2)

- Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974), "III.4.3 *Contagious distributions and truncated distributions*", *Introduction to the theory of statistics* (3rd ed.), New York: McGraw-Hill, [ISBN](/source/ISBN_(identifier)) [978-0-07-042864-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-07-042864-5)

- Johnson, N. L.; [Kemp, A. W.](/source/Adrienne_W._Kemp); Kotz, S. (2005), "8 *Mixture distributions*", *Univariate discrete distributions*, New York: Wiley, [ISBN](/source/ISBN_(identifier)) [978-0-471-27246-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-27246-5)

---
Adapted from the Wikipedia article [Compound probability distribution](https://en.wikipedia.org/wiki/Compound_probability_distribution) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Compound_probability_distribution?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.