# Distance correlation

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Distance_correlation
> Markdown URL: https://mediated.wiki/source/Distance_correlation.md
> Source: https://en.wikipedia.org/wiki/Distance_correlation
> Source revision: 1354896094
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Statistical measure

In [statistics](/source/Statistics) and in [probability theory](/source/Probability_theory), **distance correlation** is a measure of [dependence](/source/Independence_(probability_theory)) between two paired [random vectors](/source/Random_vector) of arbitrary, not necessarily equal, [dimension](/source/Euclidean_vector). The population distance correlation coefficient is zero [if and only if](/source/If_and_only_if) the random vectors are [independent](/source/Independence_(probability_theory)). Thus, distance correlation measures both linear and nonlinear [association](/source/Association_(statistics)) between two random variables or random vectors. This is in contrast to [Pearson's correlation](/source/Pearson's_correlation), which can only detect linear association between two [random variables](/source/Random_variable).

Distance correlation can be used to perform a [statistical test](/source/Statistical_hypothesis_testing) of dependence with a [permutation test](/source/Permutation_test). One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.

Several sets of (*x*, *y*) points, with the distance correlation coefficient of *x* and *y* for each set. Compare to the graph on [correlation](/source/Correlation)

## Background

The classical measure of dependence, the [Pearson correlation coefficient](/source/Pearson_product-moment_correlation_coefficient),[1] is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by [Gábor J. Székely](/source/G%C3%A1bor_J._Sz%C3%A9kely) in several lectures to address this deficiency of Pearson's [correlation](/source/Correlation), namely that it can easily be zero for [dependent variables](/source/Dependent_and_independent_variables). Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009.[2][3] It was proved that distance covariance is the same as the Brownian covariance.[3] These measures are examples of [energy distances](/source/Energy_distance).

The distance correlation is derived from a number of other quantities that are used in its specification, specifically: **distance variance**, **distance standard deviation**, and **distance covariance**. These quantities take the same roles as the ordinary [moments](/source/Moment_(mathematics)) with corresponding names in the specification of the [Pearson product-moment correlation coefficient](/source/Pearson_product-moment_correlation_coefficient).

## Definitions

### Distance covariance

Let us start with the definition of the **sample distance covariance**. Let (*X**k*, *Y**k*), *k* = 1, 2, ..., *n* be a [statistical sample](/source/Statistical_sample) from a pair of real valued or vector valued random variables (*X*, *Y*). First, compute the *n* by *n* [distance matrices](/source/Distance_matrix) (*a**j*, *k*) and (*b**j*, *k*) containing all pairwise [distances](/source/Euclidean_distance)

- a j , k = ‖ X j − X k ‖ , j , k = 1 , 2 , … , n , b j , k = ‖ Y j − Y k ‖ , j , k = 1 , 2 , … , n , {\displaystyle {\begin{aligned}a_{j,k}&=\|X_{j}-X_{k}\|,\qquad j,k=1,2,\ldots ,n,\\b_{j,k}&=\|Y_{j}-Y_{k}\|,\qquad j,k=1,2,\ldots ,n,\end{aligned}}}

where ||⋅ ||denotes [Euclidean norm](/source/Euclidean_norm). Then take all doubly centered distances

- A j , k := a j , k − a ¯ j ⋅ − a ¯ ⋅ k + a ¯ ⋅ ⋅ , B j , k := b j , k − b ¯ j ⋅ − b ¯ ⋅ k + b ¯ ⋅ ⋅ , {\displaystyle A_{j,k}:=a_{j,k}-{\overline {a}}_{j\cdot }-{\overline {a}}_{\cdot k}+{\overline {a}}_{\cdot \cdot },\qquad B_{j,k}:=b_{j,k}-{\overline {b}}_{j\cdot }-{\overline {b}}_{\cdot k}+{\overline {b}}_{\cdot \cdot },}

where a ¯ j ⋅ {\displaystyle \textstyle {\overline {a}}_{j\cdot }} is the *j*-th row mean, a ¯ ⋅ k {\displaystyle \textstyle {\overline {a}}_{\cdot k}} is the *k*-th column mean, and a ¯ ⋅ ⋅ {\displaystyle \textstyle {\overline {a}}_{\cdot \cdot }} is the [grand mean](/source/Grand_mean) of the distance matrix of the *X* sample. The notation is similar for the *b* values. (In the matrices of centered distances (*A**j*, *k*) and (*B**j*,*k*) all rows and all columns sum to zero.) The squared **sample distance covariance** (a scalar) is simply the arithmetic average of the products *A**j*, *k**B**j*, *k*:

- dCov n 2 ⁡ ( X , Y ) := 1 n 2 ∑ j = 1 n ∑ k = 1 n A j , k B j , k . {\displaystyle \operatorname {dCov} _{n}^{2}(X,Y):={\frac {1}{n^{2}}}\sum _{j=1}^{n}\sum _{k=1}^{n}A_{j,k}\,B_{j,k}.}

The statistic *T**n* = *n* dCov2*n*(*X*, *Y*) determines a consistent multivariate test of independence of random vectors in arbitrary dimensions. For an implementation see *dcov.test* function in the *energy* package for [R](/source/R_(programming_language)).[4]

The population value of **distance covariance** can be defined along the same lines. Let *X* be a random variable that takes values in a *p*-dimensional Euclidean space with [probability distribution](/source/Probability_distribution) μ and let *Y* be a random variable that takes values in a *q*-dimensional Euclidean space with probability distribution ν, and suppose that *X* and *Y* have finite expectations. Write

- a μ ( x ) := E ⁡ [ ‖ X − x ‖ ] , D ( μ ) := E ⁡ [ a μ ( X ) ] , d μ ( x , x ′ ) := ‖ x − x ′ ‖ − a μ ( x ) − a μ ( x ′ ) + D ( μ ) . {\displaystyle a_{\mu }(x):=\operatorname {E} [\|X-x\|],\quad D(\mu ):=\operatorname {E} [a_{\mu }(X)],\quad d_{\mu }(x,x'):=\|x-x'\|-a_{\mu }(x)-a_{\mu }(x')+D(\mu ).}

Finally, define the population value of squared distance covariance of *X* and *Y* as

- dCov 2 ⁡ ( X , Y ) := E ⁡ [ d μ ( X , X ′ ) d ν ( Y , Y ′ ) ] . {\displaystyle \operatorname {dCov} ^{2}(X,Y):=\operatorname {E} {\big [}d_{\mu }(X,X')d_{\nu }(Y,Y'){\big ]}.}

One can show that this is equivalent to the following definition:

- dCov 2 ⁡ ( X , Y ) := E ⁡ [ ‖ X − X ′ ‖ ‖ Y − Y ′ ‖ ] + E ⁡ [ ‖ X − X ′ ‖ ] E ⁡ [ ‖ Y − Y ′ ‖ ] − E ⁡ [ ‖ X − X ′ ‖ ‖ Y − Y ″ ‖ ] − E ⁡ [ ‖ X − X ″ ‖ ‖ Y − Y ′ ‖ ] = E ⁡ [ ‖ X − X ′ ‖ ‖ Y − Y ′ ‖ ] + E ⁡ [ ‖ X − X ′ ‖ ] E ⁡ [ ‖ Y − Y ′ ‖ ] − 2 E ⁡ [ ‖ X − X ′ ‖ ‖ Y − Y ″ ‖ ] , {\displaystyle {\begin{aligned}\operatorname {dCov} ^{2}(X,Y):={}&\operatorname {E} [\|X-X'\|\,\|Y-Y'\|]+\operatorname {E} [\|X-X'\|]\,\operatorname {E} [\|Y-Y'\|]\\&\qquad {}-\operatorname {E} [\|X-X'\|\,\|Y-Y''\|]-\operatorname {E} [\|X-X''\|\,\|Y-Y'\|]\\={}&\operatorname {E} [\|X-X'\|\,\|Y-Y'\|]+\operatorname {E} [\|X-X'\|]\,\operatorname {E} [\|Y-Y'\|]\\&\qquad {}-2\operatorname {E} [\|X-X'\|\,\|Y-Y''\|],\end{aligned}}}

where ***E*** denotes expected value, and ( X , Y ) , {\displaystyle \textstyle (X,Y),} ( X ′ , Y ′ ) , {\displaystyle \textstyle (X',Y'),} and ( X ″ , Y ″ ) {\displaystyle \textstyle (X'',Y'')} are independent and identically distributed. The primed random variables ( X ′ , Y ′ ) {\displaystyle \textstyle (X',Y')} and ( X ″ , Y ″ ) {\displaystyle \textstyle (X'',Y'')} denote independent and identically distributed (iid) copies of the variables X {\displaystyle X} and Y {\displaystyle Y} and are similarly iid.[5] Distance covariance can be expressed in terms of the classical Pearson's [covariance](/source/Covariance), **cov**, as follows:

- dCov 2 ⁡ ( X , Y ) = cov ⁡ ( ‖ X − X ′ ‖ , ‖ Y − Y ′ ‖ ) − 2 cov ⁡ ( ‖ X − X ′ ‖ , ‖ Y − Y ″ ‖ ) . {\displaystyle \operatorname {dCov} ^{2}(X,Y)=\operatorname {cov} (\|X-X'\|,\|Y-Y'\|)-2\operatorname {cov} (\|X-X'\|,\|Y-Y''\|).}

This identity shows that the distance covariance is not the same as the covariance of distances, cov(‖*X* − *X'*‖, ‖*Y* − *Y'* ‖). This can be zero even if *X* and *Y* are not independent.[6]

Alternatively, the distance covariance can be defined as the weighted [*L*2 norm](/source/Norm_(mathematics)#Euclidean_norm) of the distance between the joint [characteristic function](/source/Characteristic_function_(probability_theory)) of the random variables and the product of their marginal characteristic functions:[7]

- dCov 2 ⁡ ( X , Y ) = 1 c p c q ∫ R p + q | φ X , Y ( s , t ) − φ X ( s ) φ Y ( t ) | 2 | s | p 1 + p | t | q 1 + q d t d s {\displaystyle \operatorname {dCov} ^{2}(X,Y)={\frac {1}{c_{p}c_{q}}}\int _{\mathbb {R} ^{p+q}}{\frac {\left|\varphi _{X,Y}(s,t)-\varphi _{X}(s)\varphi _{Y}(t)\right|^{2}}{|s|_{p}^{1+p}|t|_{q}^{1+q}}}\,dt\,ds}

where φ X , Y ( s , t ) {\displaystyle \varphi _{X,Y}(s,t)} , φ X ( s ) {\displaystyle \varphi _{X}(s)} , and φ Y ( t ) {\displaystyle \varphi _{Y}(t)} are the [characteristic functions](/source/Characteristic_function_(probability_theory)) of (*X*, *Y*), *X*, and *Y*, respectively, *p*, *q* denote the Euclidean dimension of *X* and *Y*, and thus of *s* and *t*, and *c**p*, *c**q* are constants. The [weight function](/source/Weight_function) ( c p c q | s | p 1 + p | t | q 1 + q ) − 1 {\displaystyle ({c_{p}c_{q}}{|s|_{p}^{1+p}|t|_{q}^{1+q}})^{-1}} is chosen to produce a scale equivariant and rotation [invariant measure](/source/Invariant_measure) that doesn't go to zero for dependent variables.[7][8] One interpretation of the characteristic function definition is that the variables *eisX* and *eitY* are cyclic representations of *X* and *Y* with different periods given by *s* and *t*, and the expression *ϕ**X*, *Y*(*s*, *t*) − *ϕ**X*(*s*) *ϕ**Y*(*t*) in the numerator of the characteristic function definition of distance covariance is simply the classical covariance of *eisX* and *eitY*. The characteristic function definition clearly shows that dCov2(*X*, *Y*) = 0 if and only if *X* and *Y* are independent.

### Distance variance and distance standard deviation

The *distance variance* is a special case of distance covariance when the two variables are identical. The population value of distance variance is the [square root](/source/Square_root) of

- dVar 2 ⁡ ( X ) := E ⁡ [ ‖ X − X ′ ‖ 2 ] + E 2 ⁡ [ ‖ X − X ′ ‖ ] − 2 E ⁡ [ ‖ X − X ′ ‖ ‖ X − X ″ ‖ ] , {\displaystyle \operatorname {dVar} ^{2}(X):=\operatorname {E} [\|X-X'\|^{2}]+\operatorname {E} ^{2}[\|X-X'\|]-2\operatorname {E} [\|X-X'\|\,\|X-X''\|],}

where X {\displaystyle X} , X ′ {\displaystyle X'} , and X ″ {\displaystyle X''} are [independent and identically distributed random variables](/source/Independent_and_identically_distributed_random_variables), E {\displaystyle \operatorname {E} } denotes the [expected value](/source/Expected_value), and f 2 ( ⋅ ) = ( f ( ⋅ ) ) 2 {\displaystyle f^{2}(\cdot )=(f(\cdot ))^{2}} for function f ( ⋅ ) {\displaystyle f(\cdot )} , e.g., E 2 ⁡ [ ⋅ ] = ( E ⁡ [ ⋅ ] ) 2 {\displaystyle \operatorname {E} ^{2}[\cdot ]=(\operatorname {E} [\cdot ])^{2}} .

The *sample distance variance* is the square root of

- dVar n 2 ⁡ ( X ) := dCov n 2 ⁡ ( X , X ) = 1 n 2 ∑ k , ℓ A k , ℓ 2 , {\displaystyle \operatorname {dVar} _{n}^{2}(X):=\operatorname {dCov} _{n}^{2}(X,X)={\tfrac {1}{n^{2}}}\sum _{k,\ell }A_{k,\ell }^{2},}

which is a relative of [Corrado Gini](/source/Corrado_Gini)'s [mean difference](/source/Mean_absolute_difference) introduced in 1912 (but Gini did not work with centered distances).[9]

The *distance standard deviation* is the square root of the *distance variance*.

### Distance correlation

The *distance correlation* [2][3] of two random variables is obtained by dividing their *distance covariance* by the product of their *distance standard deviations*. The distance correlation is the square root of

- dCor 2 ⁡ ( X , Y ) = dCov 2 ⁡ ( X , Y ) dVar 2 ⁡ ( X ) dVar 2 ⁡ ( Y ) , {\displaystyle \operatorname {dCor} ^{2}(X,Y)={\frac {\operatorname {dCov} ^{2}(X,Y)}{\sqrt {\operatorname {dVar} ^{2}(X)\,\operatorname {dVar} ^{2}(Y)}}},}

and the *sample distance correlation* is defined by substituting the sample distance covariance and distance variances for the population coefficients above.

For easy computation of sample distance correlation see the *dcor* function in the *energy* package for [R](/source/R_(programming_language)).[4]

## Properties

### Distance correlation

1. 0 ≤ dCor n ⁡ ( X , Y ) ≤ 1 {\displaystyle 0\leq \operatorname {dCor} _{n}(X,Y)\leq 1} and 0 ≤ dCor ⁡ ( X , Y ) ≤ 1 {\displaystyle 0\leq \operatorname {dCor} (X,Y)\leq 1} ; this is in contrast to Pearson's correlation, which can be negative.
1. dCor ⁡ ( X , Y ) = 0 {\displaystyle \operatorname {dCor} (X,Y)=0} if and only if X and Y are independent.
1. dCor n ⁡ ( X , Y ) = 1 {\displaystyle \operatorname {dCor} _{n}(X,Y)=1} implies that dimensions of the linear subspaces spanned by X and Y samples respectively are almost surely equal and if we assume that these subspaces are equal, then in this subspace Y = A + b C X {\displaystyle Y=A+b\,\mathbf {C} X} for some vector A, scalar b, and [orthonormal matrix](/source/Orthonormal_matrix) C {\displaystyle \mathbf {C} } .

### Distance covariance

1. dCov ⁡ ( X , Y ) ≥ 0 {\displaystyle \operatorname {dCov} (X,Y)\geq 0} and dCov n ⁡ ( X , Y ) ≥ 0 {\displaystyle \operatorname {dCov} _{n}(X,Y)\geq 0} ;
1. dCov 2 ⁡ ( a 1 + b 1 C 1 X , a 2 + b 2 C 2 Y ) = | b 1 b 2 | dCov 2 ⁡ ( X , Y ) {\displaystyle \operatorname {dCov} ^{2}(a_{1}+b_{1}\,\mathbf {C} _{1}\,X,a_{2}+b_{2}\,\mathbf {C} _{2}\,Y)=|b_{1}\,b_{2}|\operatorname {dCov} ^{2}(X,Y)} for all constant vectors a 1 , a 2 {\displaystyle a_{1},a_{2}} , scalars b 1 , b 2 {\displaystyle b_{1},b_{2}} , and orthonormal matrices C 1 , C 2 {\displaystyle \mathbf {C} _{1},\mathbf {C} _{2}} .
1. If the random vectors ( X 1 , Y 1 ) {\displaystyle (X_{1},Y_{1})} and ( X 2 , Y 2 ) {\displaystyle (X_{2},Y_{2})} are independent then 1. dCov ⁡ ( X 1 + X 2 , Y 1 + Y 2 ) ≤ dCov ⁡ ( X 1 , Y 1 ) + dCov ⁡ ( X 2 , Y 2 ) . {\displaystyle \operatorname {dCov} (X_{1}+X_{2},Y_{1}+Y_{2})\leq \operatorname {dCov} (X_{1},Y_{1})+\operatorname {dCov} (X_{2},Y_{2}).} Equality holds if and only if X 1 {\displaystyle X_{1}} and Y 1 {\displaystyle Y_{1}} are both constants, or X 2 {\displaystyle X_{2}} and Y 2 {\displaystyle Y_{2}} are both constants, or X 1 , X 2 , Y 1 , Y 2 {\displaystyle X_{1},X_{2},Y_{1},Y_{2}} are mutually independent.
1. dCov ⁡ ( X , Y ) = 0 {\displaystyle \operatorname {dCov} (X,Y)=0} if and only if X and Y are independent.

This last property is the most important effect of working with centered distances.

The statistic dCov n 2 ⁡ ( X , Y ) {\displaystyle \operatorname {dCov} _{n}^{2}(X,Y)} is a biased estimator of dCov 2 ⁡ ( X , Y ) {\displaystyle \operatorname {dCov} ^{2}(X,Y)} . Under independence of X and Y [10]

- E ⁡ [ dCov n 2 ⁡ ( X , Y ) ] = n − 1 n 2 { ( n − 2 ) dCov 2 ⁡ ( X , Y ) + E ⁡ [ ‖ X − X ′ ‖ ] E ⁡ [ ‖ Y − Y ′ ‖ ] } = n − 1 n 2 E ⁡ [ ‖ X − X ′ ‖ ] E ⁡ [ ‖ Y − Y ′ ‖ ] . {\displaystyle {\begin{aligned}\operatorname {E} [\operatorname {dCov} _{n}^{2}(X,Y)]&={\frac {n-1}{n^{2}}}\left\{(n-2)\operatorname {dCov} ^{2}(X,Y)+\operatorname {E} [\|X-X'\|]\,\operatorname {E} [\|Y-Y'\|]\right\}\\[6pt]&={\frac {n-1}{n^{2}}}\operatorname {E} [\|X-X'\|]\,\operatorname {E} [\|Y-Y'\|].\end{aligned}}}

An [unbiased estimator](/source/Bias_of_an_estimator) of dCov 2 ⁡ ( X , Y ) {\displaystyle \operatorname {dCov} ^{2}(X,Y)} is given by Székely and Rizzo.[11]

### Distance variance

1. dVar ⁡ ( X ) = 0 {\displaystyle \operatorname {dVar} (X)=0} if and only if X = E ⁡ [ X ] {\displaystyle X=\operatorname {E} [X]} almost surely.
1. dVar n ⁡ ( X ) = 0 {\displaystyle \operatorname {dVar} _{n}(X)=0} if and only if every sample observation is identical.
1. dVar ⁡ ( A + b C X ) = | b | dVar ⁡ ( X ) {\displaystyle \operatorname {dVar} (A+b\,\mathbf {C} \,X)=|b|\operatorname {dVar} (X)} for all constant vectors A, scalars b, and orthonormal matrices C {\displaystyle \mathbf {C} } .
1. If X and Y are independent then dVar ⁡ ( X + Y ) ≤ dVar ⁡ ( X ) + dVar ⁡ ( Y ) {\displaystyle \operatorname {dVar} (X+Y)\leq \operatorname {dVar} (X)+\operatorname {dVar} (Y)} .

Equality holds in (iv) if and only if one of the random variables X or Y is a constant.

## Generalization

Distance covariance can be generalized to include powers of Euclidean distance. Define

- dCov 2 ⁡ ( X , Y ; α ) := E ⁡ [ ‖ X − X ′ ‖ α ‖ Y − Y ′ ‖ α ] + E ⁡ [ ‖ X − X ′ ‖ α ] E ⁡ [ ‖ Y − Y ′ ‖ α ] − 2 E ⁡ [ ‖ X − X ′ ‖ α ‖ Y − Y ″ ‖ α ] . {\displaystyle {\begin{aligned}\operatorname {dCov} ^{2}(X,Y;\alpha ):={}&\operatorname {E} [\|X-X'\|^{\alpha }\,\|Y-Y'\|^{\alpha }]+\operatorname {E} [\|X-X'\|^{\alpha }]\,\operatorname {E} [\|Y-Y'\|^{\alpha }]\\&\qquad {}-2\operatorname {E} [\|X-X'\|^{\alpha }\,\|Y-Y''\|^{\alpha }].\end{aligned}}}

Then for every 0 < α < 2 {\displaystyle 0<\alpha <2} , X {\displaystyle X} and Y {\displaystyle Y} are independent if and only if dCov 2 ⁡ ( X , Y ; α ) = 0 {\displaystyle \operatorname {dCov} ^{2}(X,Y;\alpha )=0} . It is important to note that this characterization does not hold for exponent α = 2 {\displaystyle \alpha =2} ; in this case for bivariate ( X , Y ) {\displaystyle (X,Y)} , dCor ⁡ ( X , Y ; α = 2 ) {\displaystyle \operatorname {dCor} (X,Y;\alpha =2)} is a deterministic function of the Pearson correlation.[2] If a k , ℓ {\displaystyle a_{k,\ell }} and b k , ℓ {\displaystyle b_{k,\ell }} are α {\displaystyle \alpha } powers of the corresponding distances, 0 < α ≤ 2 {\displaystyle 0<\alpha \leq 2} , then α {\displaystyle \alpha } sample distance covariance can be defined as the nonnegative number for which

- dCov n 2 ⁡ ( X , Y ; α ) := 1 n 2 ∑ k , ℓ A k , ℓ B k , ℓ . {\displaystyle \operatorname {dCov} _{n}^{2}(X,Y;\alpha ):={\frac {1}{n^{2}}}\sum _{k,\ell }A_{k,\ell }\,B_{k,\ell }.}

One can extend dCov {\displaystyle \operatorname {dCov} } to [metric-space](/source/Metric_space)-valued [random variables](/source/Random_variables) X {\displaystyle X} and Y {\displaystyle Y} : If X {\displaystyle X} has law μ {\displaystyle \mu } in a metric space with metric d {\displaystyle d} , then define a μ ( x ) := E ⁡ [ d ( X , x ) ] {\displaystyle a_{\mu }(x):=\operatorname {E} [d(X,x)]} , D ( μ ) := E ⁡ [ a μ ( X ) ] {\displaystyle D(\mu ):=\operatorname {E} [a_{\mu }(X)]} , and (provided a μ {\displaystyle a_{\mu }} is finite, i.e., X {\displaystyle X} has finite first moment), d μ ( x , x ′ ) := d ( x , x ′ ) − a μ ( x ) − a μ ( x ′ ) + D ( μ ) {\displaystyle d_{\mu }(x,x'):=d(x,x')-a_{\mu }(x)-a_{\mu }(x')+D(\mu )} . Then if Y {\displaystyle Y} has law ν {\displaystyle \nu } (in a possibly different metric space with finite first moment), define

- dCov 2 ⁡ ( X , Y ) := E ⁡ [ d μ ( X , X ′ ) d ν ( Y , Y ′ ) ] . {\displaystyle \operatorname {dCov} ^{2}(X,Y):=\operatorname {E} {\big [}d_{\mu }(X,X')d_{\nu }(Y,Y'){\big ]}.}

This is non-negative for all such X , Y {\displaystyle X,Y} iff both metric spaces have negative type.[12] Here, a metric space ( M , d ) {\displaystyle (M,d)} has negative type if ( M , d 1 / 2 ) {\displaystyle (M,d^{1/2})} is [isometric](/source/Isometry) to a subset of a [Hilbert space](/source/Hilbert_space).[13] If both metric spaces have strong negative type, then dCov 2 ⁡ ( X , Y ) = 0 {\displaystyle \operatorname {dCov} ^{2}(X,Y)=0} iff X , Y {\displaystyle X,Y} are independent.[12]

## Alternative definition of distance covariance

The original [distance covariance](#Distance_covariance) has been defined as the square root of dCov 2 ⁡ ( X , Y ) {\displaystyle \operatorname {dCov} ^{2}(X,Y)} , rather than the squared coefficient itself. dCov ⁡ ( X , Y ) {\displaystyle \operatorname {dCov} (X,Y)} has the property that it is the [energy distance](/source/Energy_distance) between the joint distribution of X , Y {\displaystyle \operatorname {X} ,Y} and the product of its marginals. Under this definition, however, the distance variance, rather than the distance [standard deviation](/source/Standard_deviation), is measured in the same units as the X {\displaystyle \operatorname {X} } distances.

Alternately, one could define ***distance covariance*** to be the square of the energy distance: dCov 2 ⁡ ( X , Y ) . {\displaystyle \operatorname {dCov} ^{2}(X,Y).} In this case, the distance standard deviation of X {\displaystyle X} is measured in the same units as X {\displaystyle X} distance, and there exists an unbiased estimator for the population distance covariance.[11]

Under these alternate definitions, the distance correlation is also defined as the square dCor 2 ⁡ ( X , Y ) {\displaystyle \operatorname {dCor} ^{2}(X,Y)} , rather than the square root.

## Alternative formulation: Brownian covariance

Brownian covariance is motivated by generalization of the notion of covariance to stochastic processes. The square of the covariance of random variables X and Y can be written in the following form:

- cov ⁡ ( X , Y ) 2 = E ⁡ [ ( X − E ⁡ ( X ) ) ( X ′ − E ⁡ ( X ′ ) ) ( Y − E ⁡ ( Y ) ) ( Y ′ − E ⁡ ( Y ′ ) ) ] {\displaystyle \operatorname {cov} (X,Y)^{2}=\operatorname {E} \left[{\big (}X-\operatorname {E} (X){\big )}{\big (}X^{\mathrm {'} }-\operatorname {E} (X^{\mathrm {'} }){\big )}{\big (}Y-\operatorname {E} (Y){\big )}{\big (}Y^{\mathrm {'} }-\operatorname {E} (Y^{\mathrm {'} }){\big )}\right]}

where E denotes the [expected value](/source/Expected_value) and the prime denotes independent and identically distributed copies. We need the following generalization of this formula. If U(s), V(t) are arbitrary random processes defined for all real s and t then define the U-centered version of X by

- X U := U ( X ) − E X ⁡ [ U ( X ) ∣ { U ( t ) } ] {\displaystyle X_{U}:=U(X)-\operatorname {E} _{X}\left[U(X)\mid \left\{U(t)\right\}\right]}

whenever the subtracted conditional expected value exists and denote by YV the V-centered version of Y.[3][14][15] The (U,V) covariance of (X,Y) is defined as the nonnegative number whose square is

- cov U , V 2 ⁡ ( X , Y ) := E ⁡ [ X U X U ′ Y V Y V ′ ] {\displaystyle \operatorname {cov} _{U,V}^{2}(X,Y):=\operatorname {E} \left[X_{U}X_{U}^{\mathrm {'} }Y_{V}Y_{V}^{\mathrm {'} }\right]}

whenever the right-hand side is nonnegative and finite. The most important example is when U and V are two-sided independent [Brownian motions](/source/Brownian_motion) /[Wiener processes](/source/Wiener_process) with expectation zero and covariance |*s*| + |*t*| − |*s* − *t*| = 2 min(*s*,*t*) (for nonnegative s, t only). (This is twice the covariance of the standard Wiener process; here the factor 2 simplifies the computations.) In this case the (*U*,*V*) covariance is called **Brownian covariance** and is denoted by

- cov W ⁡ ( X , Y ) . {\displaystyle \operatorname {cov} _{W}(X,Y).}

There is a surprising coincidence: The Brownian covariance is the same as the distance covariance:

- cov W ⁡ ( X , Y ) = dCov ⁡ ( X , Y ) , {\displaystyle \operatorname {cov} _{\mathrm {W} }(X,Y)=\operatorname {dCov} (X,Y),}

and thus **Brownian correlation** is the same as distance correlation.

On the other hand, if we replace the Brownian motion with the deterministic [identity function](/source/Identity_function) *id* then Covid(*X*,*Y*) is simply the [absolute value](/source/Absolute_value) of the classical Pearson [covariance](/source/Covariance),

- cov i d ⁡ ( X , Y ) = | cov ⁡ ( X , Y ) | . {\displaystyle \operatorname {cov} _{\mathrm {id} }(X,Y)=\left\vert \operatorname {cov} (X,Y)\right\vert .}

## Related metrics

Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt Independence Criterion or HSIC) can also detect linear and nonlinear interactions. Both distance correlation and kernel-based metrics can be used in methods such as [canonical correlation analysis](/source/Canonical_correlation_analysis) and [independent component analysis](/source/Independent_component_analysis) to yield stronger [statistical power](/source/Statistical_power).

## See also

- [RV coefficient](/source/RV_coefficient)

- For a related third-order statistic, see [Distance skewness](/source/Skewness#Distance_skewness).

## Notes

1. **[^](#cite_ref-1)** Pearson [1895a](#CITEREFPearson1895a), [1895b](#CITEREFPearson1895b)

1. ^ [***a***](#cite_ref-FOOTNOTESzékelyRizzoBakirov2007_2-0) [***b***](#cite_ref-FOOTNOTESzékelyRizzoBakirov2007_2-1) [***c***](#cite_ref-FOOTNOTESzékelyRizzoBakirov2007_2-2) [Székely, Rizzo & Bakirov 2007](#CITEREFSzékelyRizzoBakirov2007).

1. ^ [***a***](#cite_ref-FOOTNOTESzékelyRizzo2009a_3-0) [***b***](#cite_ref-FOOTNOTESzékelyRizzo2009a_3-1) [***c***](#cite_ref-FOOTNOTESzékelyRizzo2009a_3-2) [***d***](#cite_ref-FOOTNOTESzékelyRizzo2009a_3-3) [Székely & Rizzo 2009a](#CITEREFSzékelyRizzo2009a).

1. ^ [***a***](#cite_ref-FOOTNOTERizzoSzékely2021_4-0) [***b***](#cite_ref-FOOTNOTERizzoSzékely2021_4-1) [Rizzo & Székely 2021](#CITEREFRizzoSzékely2021).

1. **[^](#cite_ref-FOOTNOTESzékelyRizzo201411_5-0)** [Székely & Rizzo 2014](#CITEREFSzékelyRizzo2014), p. 11.

1. **[^](#cite_ref-Raymaekers2025_6-0)** Raymaekers, Jakob; Rousseeuw, Peter J. (2 January 2025). "Distance Covariance, Independence, and Pairwise Differences". *The American Statistician*. **79** (1): 122–128. [doi](/source/Doi_(identifier)):[10.1080/00031305.2024.2374966](https://doi.org/10.1080%2F00031305.2024.2374966).

1. ^ [***a***](#cite_ref-SR2009a_7-0) [***b***](#cite_ref-SR2009a_7-1) [Székely & Rizzo 2009a](#CITEREFSzékelyRizzo2009a), p. 1249, Theorem 7, (3.7).

1. **[^](#cite_ref-FOOTNOTESzékelyRizzo2012_8-0)** [Székely & Rizzo 2012](#CITEREFSzékelyRizzo2012).

1. **[^](#cite_ref-FOOTNOTEGini1912_9-0)** [Gini 1912](#CITEREFGini1912).

1. **[^](#cite_ref-FOOTNOTESzékelyRizzo2009b_10-0)** [Székely & Rizzo 2009b](#CITEREFSzékelyRizzo2009b).

1. ^ [***a***](#cite_ref-FOOTNOTESzékelyRizzo2014_11-0) [***b***](#cite_ref-FOOTNOTESzékelyRizzo2014_11-1) [Székely & Rizzo 2014](#CITEREFSzékelyRizzo2014).

1. ^ [***a***](#cite_ref-FOOTNOTELyons2014_12-0) [***b***](#cite_ref-FOOTNOTELyons2014_12-1) [Lyons 2014](#CITEREFLyons2014).

1. **[^](#cite_ref-FOOTNOTEKlebanov2005[[Category:Wikipedia_articles_needing_page_number_citations_from_October_2021]]<sup_class="noprint_Inline-Template_"_style="white-space:nowrap;">&#91;<i>[[Wikipedia:Citing_sources|<span_title="This_citation_requires_a_reference_to_the_specific_page_or_range_of_pages_in_which_the_material_appears.&#32;(October_2021)">page&nbsp;needed</span>]]</i>&#93;</sup>_13-0)** [Klebanov 2005](#CITEREFKlebanov2005), p. [*[page needed](https://en.wikipedia.org/wiki/Wikipedia:Citing_sources)*].

1. **[^](#cite_ref-FOOTNOTEBickelXu2009_14-0)** [Bickel & Xu 2009](#CITEREFBickelXu2009).

1. **[^](#cite_ref-FOOTNOTEKosorok2009_15-0)** [Kosorok 2009](#CITEREFKosorok2009).

## References

- Bickel, Peter J.; Xu, Ying (2009). ["Discussion of: Brownian distance covariance"](http://projecteuclid.org/download/pdfview_1/euclid.aoas/1267453934). *[The Annals of Applied Statistics](/source/The_Annals_of_Applied_Statistics)*. **3** (4): 1266–1269. [arXiv](/source/ArXiv_(identifier)):[0912.3295](https://arxiv.org/abs/0912.3295). [doi](/source/Doi_(identifier)):[10.1214/09-AOAS312A](https://doi.org/10.1214%2F09-AOAS312A).

- Gini, C. (1912). *Variabilità e Mutabilità*. Bologna: Tipografia di Paolo Cuppini. [Bibcode](/source/Bibcode_(identifier)):[1912vamu.book.....G](https://ui.adsabs.harvard.edu/abs/1912vamu.book.....G).

- Klebanov, L. B. (2005). N*-distances and their applications*. Prague: [Karolinum Press](/source/Karolinum_Press), Charles University. [ISBN](/source/ISBN_(identifier)) [9788024611525](https://en.wikipedia.org/wiki/Special:BookSources/9788024611525).

- Kosorok, Michael R. (2009). "Discussion of: Brownian distance covariance". *[The Annals of Applied Statistics](/source/The_Annals_of_Applied_Statistics)*. **3** (4): 1270–1278. [arXiv](/source/ArXiv_(identifier)):[1010.0822](https://arxiv.org/abs/1010.0822). [doi](/source/Doi_(identifier)):[10.1214/09-AOAS312B](https://doi.org/10.1214%2F09-AOAS312B). [S2CID](/source/S2CID_(identifier)) [88518490](https://api.semanticscholar.org/CorpusID:88518490).

- Lyons, Russell (2014). "Distance covariance in metric spaces". *The Annals of Probability*. **41** (5): 3284–3305. [arXiv](/source/ArXiv_(identifier)):[1106.5758](https://arxiv.org/abs/1106.5758). [doi](/source/Doi_(identifier)):[10.1214/12-AOP803](https://doi.org/10.1214%2F12-AOP803). [S2CID](/source/S2CID_(identifier)) [73677891](https://api.semanticscholar.org/CorpusID:73677891).

- Pearson, K. (1895a). "Note on regression and inheritance in the case of two parents". *[Proceedings of the Royal Society](/source/Proceedings_of_the_Royal_Society)*. **58**: 240–242. [Bibcode](/source/Bibcode_(identifier)):[1895RSPS...58..240P](https://ui.adsabs.harvard.edu/abs/1895RSPS...58..240P).

- Pearson, K. (1895b). ["Notes on the history of correlation"](https://zenodo.org/record/1431597). *[Biometrika](/source/Biometrika)*. **13**: 25–45. [doi](/source/Doi_(identifier)):[10.1093/biomet/13.1.25](https://doi.org/10.1093%2Fbiomet%2F13.1.25).

- Rizzo, Maria; Székely, Gábor (2021-02-22). ["energy: E-Statistics: Multivariate Inference via the Energy of Data"](https://cran.r-project.org/web/packages/energy/index.html). Version: 1.7-8. Retrieved 2021-10-31.

- Székely, Gábor J.; Rizzo, Maria L.; Bakirov, Nail K. (2007). "Measuring and testing independence by correlation of distances". *[The Annals of Statistics](/source/The_Annals_of_Statistics)*. **35** (6): 2769–2794. [arXiv](/source/ArXiv_(identifier)):[0803.4101](https://arxiv.org/abs/0803.4101). [doi](/source/Doi_(identifier)):[10.1214/009053607000000505](https://doi.org/10.1214%2F009053607000000505). [S2CID](/source/S2CID_(identifier)) [5661488](https://api.semanticscholar.org/CorpusID:5661488).

- Székely, Gábor J.; Rizzo, Maria L. (2009a). ["Brownian distance covariance"](http://projecteuclid.org/download/pdfview_1/euclid.aoas/1267453933). *[The Annals of Applied Statistics](/source/The_Annals_of_Applied_Statistics)*. **3** (4): 1236–1265. [doi](/source/Doi_(identifier)):[10.1214/09-AOAS312](https://doi.org/10.1214%2F09-AOAS312). [PMC](/source/PMC_(identifier)) [2889501](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2889501). [PMID](/source/PMID_(identifier)) [20574547](https://pubmed.ncbi.nlm.nih.gov/20574547).

- Székely, Gábor J.; Rizzo, Maria L. (2009b). ["Rejoinder: Brownian distance covariance"](https://doi.org/10.1214%2F09-AOAS312REJ). *[The Annals of Applied Statistics](/source/The_Annals_of_Applied_Statistics)*. **3** (4): 1303–1308. [arXiv](/source/ArXiv_(identifier)):[1010.0844](https://arxiv.org/abs/1010.0844). [doi](/source/Doi_(identifier)):[10.1214/09-AOAS312REJ](https://doi.org/10.1214%2F09-AOAS312REJ).

- Székely, Gábor J.; Rizzo, Maria L. (2012). "On the uniqueness of distance covariance". *[Statistics & Probability Letters](https://en.wikipedia.org/w/index.php?title=Statistics_%26_Probability_Letters&action=edit&redlink=1)*. **82** (12): 2278–2282. [doi](/source/Doi_(identifier)):[10.1016/j.spl.2012.08.007](https://doi.org/10.1016%2Fj.spl.2012.08.007).

- Székely, Gabor J.; Rizzo, Maria L. (2014). "Partial Distance Correlation with Methods for Dissimilarities". *[The Annals of Statistics](/source/The_Annals_of_Statistics)*. **42** (6): 2382–2412. [arXiv](/source/ArXiv_(identifier)):[1310.2926](https://arxiv.org/abs/1310.2926). [Bibcode](/source/Bibcode_(identifier)):[2014arXiv1310.2926S](https://ui.adsabs.harvard.edu/abs/2014arXiv1310.2926S). [doi](/source/Doi_(identifier)):[10.1214/14-AOS1255](https://doi.org/10.1214%2F14-AOS1255). [S2CID](/source/S2CID_(identifier)) [55801702](https://api.semanticscholar.org/CorpusID:55801702).

## External links

- [E-statistics (energy statistics)](http://personal.bgsu.edu/~mrizzo/energy.htm) [Archived](https://web.archive.org/web/20190913232038/http://personal.bgsu.edu/~mrizzo/energy.htm) 2019-09-13 at the [Wayback Machine](/source/Wayback_Machine)

---
Adapted from the Wikipedia article [Distance correlation](https://en.wikipedia.org/wiki/Distance_correlation) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Distance_correlation?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
