# Explained variation

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Explained_variation
> Markdown URL: https://mediated.wiki/source/Explained_variation.md
> Source: https://en.wikipedia.org/wiki/Explained_variation
> Source revision: 1311974482
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Concept in mathematical modelling

In [statistics](/source/Statistics), **explained variation** measures the proportion to which a mathematical model accounts for the variation ([dispersion](/source/Dispersion_(statistics))) of a given [data set](/source/Data_set). Often, variation is quantified as [variance](/source/Variance); then, the more specific term **explained variance** can be used.

The complementary part of the total variation is called **[unexplained](/source/Fraction_of_variance_unexplained)** or **[residual](/source/Residual_(statistics)) variation**; likewise, when discussing variance as such, this is referred to as **unexplained** or **residual variance**.

## Definition in terms of information gain

### Information gain by better modelling

Following Kent (1983),[1] we use the Fraser information (Fraser 1965)[2]

- F ( θ ) = ∫ d r g ( r ) ln ⁡ f ( r ; θ ) {\displaystyle F(\theta )=\int {\textrm {d}}r\,g(r)\,\ln f(r;\theta )}

where g ( r ) {\displaystyle g(r)} is the probability density of a random variable R {\displaystyle R\,} , and f ( r ; θ ) {\displaystyle f(r;\theta )\,} with θ ∈ Θ i {\displaystyle \theta \in \Theta _{i}} ( i = 0 , 1 {\displaystyle i=0,1\,} ) are two families of parametric models. Model family 0 is the simpler one, with a restricted parameter space Θ 0 ⊂ Θ 1 {\displaystyle \Theta _{0}\subset \Theta _{1}} .

Parameters are determined by [maximum likelihood estimation](/source/Maximum_likelihood_estimation),

- θ i = argmax θ ∈ Θ i ⁡ F ( θ ) . {\displaystyle \theta _{i}=\operatorname {argmax} _{\theta \in \Theta _{i}}F(\theta ).}

The information gain of model 1 over model 0 is written as

- Γ ( θ 1 : θ 0 ) = 2 [ F ( θ 1 ) − F ( θ 0 ) ] {\displaystyle \Gamma (\theta _{1}:\theta _{0})=2[F(\theta _{1})-F(\theta _{0})]\,}

where a factor of 2 is included for convenience. Γ is always nonnegative; it measures the extent to which the best model of family 1 is better than the best model of family 0 in explaining *g*(*r*).

### Information gain by a conditional model

Assume a two-dimensional random variable R = ( X , Y ) {\displaystyle R=(X,Y)} where *X* shall be considered as an explanatory variable, and *Y* as a dependent variable. Models of family 1 "explain" *Y* in terms of *X*,

- f ( y ∣ x ; θ ) {\displaystyle f(y\mid x;\theta )} ,

whereas in family 0, *X* and *Y* are assumed to be independent. We define the randomness of *Y* by D ( Y ) = exp ⁡ [ − 2 F ( θ 0 ) ] {\displaystyle D(Y)=\exp[-2F(\theta _{0})]} , and the randomness of *Y*, given *X*, by D ( Y ∣ X ) = exp ⁡ [ − 2 F ( θ 1 ) ] {\displaystyle D(Y\mid X)=\exp[-2F(\theta _{1})]} . Then,

- ρ C 2 = 1 − D ( Y ∣ X ) / D ( Y ) {\displaystyle \rho _{C}^{2}=1-D(Y\mid X)/D(Y)}

can be interpreted as proportion of the data dispersion which is "explained" by *X*.

## Special cases and generalized usage

### Linear regression

Main article: [Fraction of variance unexplained](/source/Fraction_of_variance_unexplained)

The fraction of variance unexplained is an established concept in the context of [linear regression](/source/Linear_regression). The usual definition of the [coefficient of determination](/source/Coefficient_of_determination) is based on the fundamental concept of explained variance.

### Correlation coefficient as measure of explained variance

Let *X* be a random vector, and *Y* a random variable that is modeled by a [normal distribution](/source/Normal_distribution) with centre μ = Ψ T X {\displaystyle \mu =\Psi ^{\textrm {T}}X} . In this case, the above-derived proportion of explained variation ρ C 2 {\displaystyle \rho _{C}^{2}} equals the squared [correlation coefficient](/source/Pearson_product-moment_correlation_coefficient) R 2 {\displaystyle R^{2}} .

Note the strong model assumptions: the centre of the *Y* distribution must be a linear function of *X*, and for any given *x*, the *Y* distribution must be normal. In other situations, it is generally not justified to interpret R 2 {\displaystyle R^{2}} as proportion of explained variance.

### In principal component analysis

Explained variance is routinely used in [principal component analysis](/source/Principal_component_analysis). The relation to the Fraser–Kent information gain remains to be clarified.

## Criticism

As the fraction of "explained variance" equals the squared correlation coefficient R 2 {\displaystyle R^{2}} , it shares all the disadvantages of the latter: it reflects not only the quality of the regression, but also the distribution of the independent (conditioning) variables.

In the words of one critic: "Thus R 2 {\displaystyle R^{2}} gives the 'percentage of variance explained' by the regression, an expression that, for most social scientists, is of doubtful meaning but great rhetorical value. If this number is large, the regression gives a good fit, and there is little point in searching for additional variables. Other regression equations on different data sets are said to be less satisfactory or less powerful if their R 2 {\displaystyle R^{2}} is lower. Nothing about R 2 {\displaystyle R^{2}} supports these claims".[3]: 58 And, after constructing an example where R 2 {\displaystyle R^{2}} is enhanced just by jointly considering data from two different populations: "'Explained variance' explains nothing."[3][*[page needed](https://en.wikipedia.org/wiki/Wikipedia:Citing_sources)*][4]: 183

## See also

- [Analysis of variance](/source/Analysis_of_variance)

- [Variance reduction](/source/Variance_reduction)

- [Variance-based sensitivity analysis](/source/Variance-based_sensitivity_analysis)

## References

1. **[^](#cite_ref-1)** Kent, J. T. (1983). "Information gain and a general measure of correlation". *[Biometrika](/source/Biometrika)*. **70** (1): 163–173. [doi](/source/Doi_(identifier)):[10.1093/biomet/70.1.163](https://doi.org/10.1093%2Fbiomet%2F70.1.163). [JSTOR](/source/JSTOR_(identifier)) [2335954](https://www.jstor.org/stable/2335954).

1. **[^](#cite_ref-2)** Fraser, D. A. S. (1965). ["On Information in Statistics"](https://doi.org/10.1214%2Faoms%2F1177700061). *Ann. Math. Statist*. **36** (3): 890–896. [doi](/source/Doi_(identifier)):[10.1214/aoms/1177700061](https://doi.org/10.1214%2Faoms%2F1177700061).

1. ^ [***a***](#cite_ref-Achen_1982_3-0) [***b***](#cite_ref-Achen_1982_3-1) Achen, C. H. (1982). *Interpreting and Using Regression*. Beverly Hills: Sage. pp. 58–59. [ISBN](/source/ISBN_(identifier)) [0-8039-1915-8](https://en.wikipedia.org/wiki/Special:BookSources/0-8039-1915-8).

1. **[^](#cite_ref-4)** Achen, C. H. (1990). "'What Does "Explained Variance" Explain?: Reply". *Political Analysis*. **2** (1): 173–184. [doi](/source/Doi_(identifier)):[10.1093/pan/2.1.173](https://doi.org/10.1093%2Fpan%2F2.1.173).

## External links

- [Explained and Unexplained Variance on a graph](https://web.archive.org/web/20080413144223/http://darwin.cwru.edu/~witte/statistics/explained_variance.htm)

---
Adapted from the Wikipedia article [Explained variation](https://en.wikipedia.org/wiki/Explained_variation) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Explained_variation?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.