Multivariate t-distribution

{{Short description|Multivariable generalization of the Student's t-distribution}} {{DISPLAYTITLE:Multivariate ''t''-distribution}} {{Probability distribution| name =Multivariate ''t''| type =density| pdf_image =| cdf_image =| notation =<math>t_p(\boldsymbol\mu,\boldsymbol\Sigma,\nu)</math>| parameters =<math>\boldsymbol\mu = [\mu_1, \dots, \mu_p]^\mathsf{T}</math> [[location parameter|location]] ([[real number|real]] <math>p\times 1</math> [[random vector|vector]]) <math>\boldsymbol\Sigma</math> [[scale parameter|scale matrix]] ([[positive-definite matrix|positive-definite]] real <math>p\times p</math> [[matrix (mathematics)|matrix]]) <math>\nu > 0</math> (real) represents the [[Degrees of freedom (statistics)|degrees of freedom]] | support =<math>\mathbf{x} \in\mathbb{R}^p\!</math>| pdf =<math> \frac{\Gamma\left[(\nu+p)/2\right]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}\left|{\boldsymbol\Sigma}\right|^{1/2}}\left[1+\frac{1}{\nu}({\mathbf x}-{\boldsymbol\mu})^\mathsf{T}{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\right]^{-(\nu+p)/2}</math>| cdf =No analytic expression, but see text for approximations| mean =<math>\boldsymbol\mu</math> if <math>\nu > 1</math>; else undefined| median =<math>\boldsymbol\mu</math>| mode =<math>\boldsymbol\mu</math>| variance =<math>\frac{\nu}{\nu-2} \boldsymbol\Sigma</math> (covariance matrix) if <math>\nu > 2</math>; else undefined| skewness =0 if <math>\nu > 3</math>; else undefined| kurtosis =| entropy =| mgf =| char =| }}

In [[statistics]], the '''multivariate ''t''-distribution''' (or '''multivariate Student distribution''') is a [[multivariate probability distribution]]. It is a generalization to [[random vector]]s of the [[Student's t-distribution|Student's ''t''-distribution]], which is a distribution applicable to univariate [[random variable]]s. While the case of a [[random matrix]] could be treated within this structure, the [[matrix t-distribution|matrix ''t''-distribution]] is distinct and makes particular use of the matrix structure.

==Definition== One common method of construction of a multivariate ''t''-distribution, for the case of <math>p</math> dimensions, is based on the observation that if <math>\mathbf y</math> and <math>u</math> are independent and distributed as <math>N({\boldsymbol\mu},{\boldsymbol\Sigma})</math> and <math>\chi^2_\nu</math> (i.e. [[multivariate normal distribution|multivariate normal]] and [[chi-squared distribution]]s) respectively, the matrix <math>\mathbf{\Sigma}\,</math> is a ''p'' × ''p'' matrix, and <math>{\boldsymbol\mu}</math> is a constant vector then the random variable <math display="inline">{\mathbf x}={\mathbf y}/\sqrt{u/\nu} +{\boldsymbol\mu}</math> has the density<ref name=":0">{{Cite web |last=Roth |first=Michael |date=17 April 2013 |title=On the Multivariate t Distribution |url=http://users.isy.liu.se/en/rt/roth/student.pdf |url-status=live |access-date=1 June 2022 |website=Automatic Control group. Linköpin University, Sweden |archive-date=31 July 2022 |archive-url=https://web.archive.org/web/20220731142649/http://users.isy.liu.se/en/rt/roth/student.pdf }}</ref>

<math display="block"> \frac{\Gamma\left[(\nu+p)/2\right]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}\left|{\boldsymbol\Sigma}\right|^{1/2}} \left[1+\frac{1}{\nu} \left({\mathbf x}-{\boldsymbol\mu}\right)^\mathsf{T} {\boldsymbol\Sigma}^{-1} \left({\mathbf x}-{\boldsymbol\mu}\right)\right]^{-(\nu+p)/2}</math>

and is said to be distributed as a multivariate ''t''-distribution with parameters <math>{\boldsymbol\Sigma},{\boldsymbol\mu},\nu</math>. Note that <math>\mathbf\Sigma</math> is not the covariance matrix since the covariance is given by <math>\nu/(\nu-2)\mathbf\Sigma</math> (for <math>\nu > 2</math>).

The constructive definition of a multivariate ''t''-distribution simultaneously serves as a sampling algorithm: # Generate <math>u \sim \chi^2_\nu</math> and <math>\mathbf{y} \sim N(\mathbf{0}, \boldsymbol{\Sigma})</math>, independently. # Compute <math display="inline">\mathbf{x} \gets \mathbf{y}\sqrt{\nu/u}+ \boldsymbol{\mu}</math>. This formulation gives rise to the hierarchical representation of a multivariate ''t''-distribution as a scale-mixture of normals: <math>u \sim \mathrm{Ga}(\nu/2,\nu/2)</math> where <math>\mathrm{Ga}(a,b)</math> indicates a gamma distribution with density proportional to <math>x^{a-1}e^{-bx}</math>, and <math>\mathbf{x}\mid u</math> conditionally follows <math>N(\boldsymbol{\mu},u^{-1}\boldsymbol{\Sigma})</math>.

In the special case <math>\nu = 1</math>, the distribution is a [[Cauchy distribution#Multivariate Cauchy distribution|multivariate Cauchy distribution]].

==Derivation==

There are in fact many candidates for the multivariate generalization of [[Student's t-distribution|Student's ''t''-distribution]]. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (<math>p=1</math>), with <math>t=x-\mu</math> and <math>\Sigma=1</math>, we have the [[probability density function]] <math display="block">f(t) = \frac{\Gamma[(\nu+1)/2]}{\sqrt{\nu\pi\,}\,\Gamma[\nu/2]} (1+t^2/\nu)^{-(\nu+1)/2}</math> and one approach is to use a corresponding function of several variables. This is the basic idea of [[elliptical distribution]] theory, where one writes down a corresponding function of <math>p</math> variables <math>t_i</math> that replaces <math>t^2</math> by a quadratic function of all the <math>t_i</math>. It is clear that this only makes sense when all the marginal distributions have the same [[Degrees of freedom (statistics)|degrees of freedom]] <math>\nu</math>. With <math> \mathbf{A} = \boldsymbol\Sigma^{-1}</math>, one has a simple choice of multivariate density function

<math display="block">f(\mathbf t) = \frac{\Gamma((\nu+p)/2)\left|\mathbf{A}\right|^{1/2}}{\sqrt{\nu^p\pi^p\,}\,\Gamma(\nu/2)} \left(1+\sum_{i,j=1}^{p,p} A_{ij} t_i t_j/\nu\right)^{-(\nu+p)/2}</math>

which is the standard but not the only choice.

An important special case is the standard '''bivariate ''t''-distribution'''{{anchor|bivariate}}, ''p'' = 2:

<math display="block">f(t_1,t_2) = \frac{\left|\mathbf{A}\right|^{1/2}}{2\pi} \left(1+\sum_{i,j=1}^{2,2} A_{ij} t_i t_j/\nu\right)^{-(\nu+2)/2}</math>

Note that <math>\frac{\Gamma{\left(\frac{\nu +2}{2}\right)}}{\pi\nu \, \Gamma{\left(\frac{\nu}{2}\right)}} = \frac{1}{2\pi}</math>.

Now, if <math>\mathbf{A}</math> is the identity matrix, the density is

<math display="block">f(t_1,t_2) = \frac{1}{2\pi} \left(1+(t_1^2 + t_2^2)/\nu\right)^{-(\nu+2)/2}.</math>

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When <math> \Sigma</math> is diagonal the standard representation can be shown to have zero [[Pearson product-moment correlation coefficient|correlation]] but the [[marginal distribution]]s are not [[statistical independence|statistically independent]].

A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios.<ref name=":2" />

== Cumulative distribution function == The definition of the [[cumulative distribution function]] (cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here <math>\mathbf{x}</math> is a real vector):

<math display="block"> F(\mathbf{x}) = \mathbb{P}(\mathbf{X}\leq \mathbf{x}), \quad \textrm{where}\;\; \mathbf{X}\sim t_\nu(\boldsymbol\mu,\boldsymbol\Sigma).</math> There is no simple formula for <math>F(\mathbf{x})</math>, but it can be [http://www.mathworks.com/matlabcentral/fileexchange/53796 approximated numerically] via [[Monte Carlo integration]].<ref name="bochen22">{{cite book |last1=Botev |first1=Z. |last2=Chen |first2=Y.-L. |date=2022 |editor-last1=Botev|editor-first1=Zdravko|editor-last2=Keller|editor-first2=Alexander|editor-last3=Lemieux|editor-first3=Christiane| editor-last4=Tuffin|editor-first4=Bruno|title=Advances in Modeling and Simulation: Festschrift for Pierre L'Ecuyer |publisher=Springer|pages=65–87 |chapter=Chapter 4: Truncated Multivariate Student Computations via Exponential Tilting. |doi=10.1007/978-3-031-10193-9_4 |chapter-url=https://doi.org/10.1007/978-3-031-10193-9_4 |isbn=978-3-031-10192-2}}</ref><ref name="boLec16">{{cite conference |title=Efficient probability estimation and simulation of the truncated multivariate student-t distribution |last1=Botev |first1=Z. I. |last2=L'Ecuyer |first2=P. |date=6 December 2015 |publisher=IEEE |book-title=2015 Winter Simulation Conference (WSC) |pages=380–391 |location=Huntington Beach, CA, USA |doi=10.1109/WSC.2015.7408180 |hdl=1959.4/unsworks_38275 |hdl-access=free }} </ref><ref name=Genz>{{cite book|last=Genz|first=Alan|title=Computation of Multivariate Normal and t Probabilities|series=Lecture Notes in Statistics |date=2009|volume=195 |publisher=Springer|doi=10.1007/978-3-642-01689-9 |isbn=978-3-642-01689-9|url=https://www.springer.com/statistics/computational+statistics/book/978-3-642-01688-2|access-date=2017-09-05|archive-date=2022-08-27|archive-url=https://web.archive.org/web/20220827214814/https://link.springer.com/book/10.1007/978-3-642-01689-9|url-status=live}}</ref>

==Conditional Distribution== This was developed by Muirhead <ref name=":1">{{Cite book |last=Muirhead |first=Robb |title=Aspects of Multivariate Statistical Theory |publisher=Wiley |year=1982 |isbn=978-0-47 1-76985-9 |location=USA |pages=32–36 Theorem 1.5.4}}</ref> and Cornish,<ref>{{Cite journal |last=Cornish |first=E A |date=1954 |title=The Multivariate t-Distribution Associated with a Set of Normal Sample Deviates. |url=https://www.publish.csiro.au/PH/pdf/PH540531 |journal=Australian Journal of Physics |volume=7 |pages=531–542 |doi=10.1071/PH550193|doi-access=free }}</ref> but later derived using the simpler chi-squared ratio representation above, by Roth<ref name=":0" /> and Ding.<ref>{{cite journal |last1=Ding |first1=Peng |year=2016 |title=On the Conditional Distribution of the Multivariate t Distribution |url=https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1164756 |journal=The American Statistician |volume=70 |issue=3 |pages=293–295 |arxiv=1604.00561 |doi=10.1080/00031305.2016.1164756 |s2cid=55842994}}</ref> Let vector <math> X </math> follow a multivariate ''t'' distribution and partition into two subvectors of <math> p_1, p_2 </math> elements: <math display="block"> X_p = \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} \sim t_p \left( \mu_p, \Sigma_{p \times p}, \nu \right) </math>

where <math> p_1 + p_2 = p </math>, the known mean vectors are <math> \mu_p = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}</math> and the scale matrix is <math> \Sigma_{p \times p} = \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} </math>.

Roth and Ding find the conditional distribution <math> p(X_1|X_2) </math> to be a new ''t''-distribution with modified parameters.

<math display="block"> X_1|X_2 \sim t_{p_1}\left( \mu_{1|2},\, \frac{\nu + d_2}{\nu + p_2} \Sigma_{11|2}, \, \nu + p_2 \right)</math>

An equivalent expression in Kotz et. al. is somewhat less concise.

Thus the conditional distribution is most easily represented as a two-step procedure. Form first the intermediate distribution <math> X_1|X_2 \sim t_{ p_1}\left( \mu_{1|2}, \Psi ,\tilde{\nu} \right)</math> above then, using the parameters below, the explicit conditional distribution becomes

<math display="block"> f(X_1|X_2) =\frac{\Gamma{\left(\frac{\tilde \nu + p_1}{2}\right)}}{\Gamma{\left(\frac{\tilde\nu}{2}\right)} \left(\pi \,\tilde \nu \right)^{p_1/2} \left|{\boldsymbol\Psi}\right|^{1/2}} \left[1 + \frac{1}{\tilde \nu} \left( X_1 - \mu_{1|2} \right)^\mathsf{T} {\boldsymbol\Psi}^{-1} \left(X_1- \mu_{1|2} \right)\right]^{-(\tilde \nu + p_1)/2}</math> where <math display="block"> \tilde \nu = \nu + p_2 </math>Effective degrees of freedom, <math> \nu </math> is augmented by the number of disused variables <math> p_2 </math>. <math display="block"> \mu_{1|2} = \mu_1 + \Sigma_{12} \Sigma_{22}^{-1} \left(X_2 - \mu_2 \right ) </math> is the conditional mean of <math>x_1 </math> <math display="block"> \Sigma_{11|2} = \Sigma_{11} - \Sigma_{12} \Sigma_{22} ^{-1} \Sigma_{21} </math> is the [[Schur complement]] of <math> \Sigma_{22} \text{ in } \Sigma </math>. <math display="block"> d_2 = (X_2 - \mu_2)^\mathsf{T} \Sigma_{22}^{-1} (X_2 - \mu_2) </math> is the squared [[Mahalanobis distance]] of <math> X_2 </math> from <math>\mu_2 </math> with scale matrix <math> \Sigma_{22} </math> <math display="block"> \Psi = \frac{\nu + d_2}{\tilde{\nu}} \Sigma_{11|2} </math> is the conditional scale matrix for <math> \tilde{\nu} > 0</math> and <math> \Sigma_{cov} = \frac{\tilde{\nu}}{\tilde{\nu}-2}\Psi= \frac{\nu + d_2}{\tilde{\nu}-2}\Sigma_{11|2} </math> is the conditional covariance matrix for <math> \tilde{\nu} > 2</math>.

==Copulas based on the multivariate ''t''== The use of such distributions is enjoying renewed interest due to applications in [[mathematical finance]], especially through the use of the Student's ''t'' [[copula (statistics)|copula]].<ref>{{Cite web |last1=Demarta |first1=Stefano |last2=McNeil |first2=Alexander |date=2004 |title=The t Copula and Related Copulas |url=https://www.risknet.de/uploads/tx_bxelibrary/t-Copula-Demarta-ETH.pdf |website=Risknet}}</ref>

==Elliptical representation== Constructed as an [[elliptical distribution]],<ref>{{Cite book |last1=Osiewalski |first1=Jacek |title=Bayesian Analysis in Statistics and Econometrics |chapter=Posterior Moments of Scale Parameters in Elliptical Sampling Models |last2=Steele |first2=Mark |publisher=Wiley |year=1996 |isbn=0-471-11856-7 |pages=323–335}}</ref> take the simplest centralised case with spherical symmetry and no scaling, <math> \Sigma = \operatorname{I} \, </math>, then the multivariate ''<big>t</big>''-PDF takes the form

<math display="block"> f_X(X)= g(X^\mathsf{T}X) = \frac{\Gamma{\left( \frac{\nu + p}{2} \right)}}{ ( \nu \pi)^{\,p/2} \Gamma{\left( \frac{\nu}{2} \right)}} \left( 1 + \nu^{-1} X^\mathsf{T} X \right)^{-( \nu + p )/2 } </math>

where <math> X = (x_1, \cdots ,x_p )^\mathsf{T} </math> is a <math>p</math>-vector and <math> \nu </math> is the degrees of freedom as defined in Muirhead<ref name=":1" /> section 1.5. The covariance of <math>X</math> is

<math display="block"> \operatorname{E} \left( XX^\mathsf{T} \right) = \int_{-\infty}^\infty \cdots \int_{-\infty}^\infty f_X(x_1,\dots, x_p) XX^\mathsf{T} \, dx_1 \dots dx_p = \frac{ \nu }{ \nu - 2 } \operatorname{I} </math>

The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder,<ref>{{Cite journal |last1=Kibria |first1=K M G |last2=Joarder |first2=A H |date=Jan 2006 |title=A short review of multivariate t distribution |url=https://jsr.isrt.ac.bd/wp-content/uploads/40n1_5.pdf |journal=Journal of Statistical Research |volume=40 |issue=1 |pages=59–72|doi=10.1007/s42979-021-00503-0 |s2cid=232163198 }}</ref> define radial measure <math> r_2 = R^2 = \frac{X^\mathsf{T}X}{p} </math> and, noting that the density is dependent only on r2, we get<blockquote><math> \operatorname{E} [ r_2 ] = \int_{-\infty}^\infty \cdots \int_{-\infty}^\infty f_X(x_1,\dots, x_p) \frac {X^\mathsf{T}X}{p}\, dx_1 \dots dx_p = \frac{\nu}{ \nu -2} </math></blockquote>which is equivalent to the variance of <math> p </math>-element vector <math>X</math> treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.

=== Radial Distribution ===

<math>r_2 = \frac{X^\mathsf{T}X}{p}</math> follows the [[Fisher-Snedecor distribution|Fisher-Snedecor]] or <math> F </math> distribution:

having mean value <math> \operatorname{E} [ r_2 ] = \frac{\nu}{\nu - 2} </math>. <math> F </math>-distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation.

By a change of random variable to <math> y = \frac{p}{\nu} r_2 = \frac {X^\mathsf{T} X}{\nu} </math> in the equation above, retaining <math> p </math>-vector <math> X </math>, we have <math> \operatorname{E} [ y ] = \int_{-\infty}^\infty \cdots \int_{-\infty}^\infty f_X(X) \frac {X^\mathsf{T}X}{\nu}\, dx_1 \dots dx_p = \frac { p }{ \nu - 2 }</math> and probability distribution <math display="block"> \begin{align} f_Y(y| \,p,\nu) & = \left | \frac {p}{\nu} \right|^{-1} B \left( \frac {p}{2}, \frac {\nu}{2} \right)^{-1} \left(\frac{p}{\nu} \right)^{p/2} \left(\frac{p}{\nu}\right)^{ -p/2 -1} y^{\,p/2 - 1} \bigl( 1 + y \bigr)^{-(p + \nu)/2 } \\[2ex] & = B \left( \frac {p}{2}, \frac {\nu}{2} \right)^{-1} y^{ \,p/2 -1 } \bigl(1 + y\bigr)^{-(\nu + p)/2} \end{align}</math>

which is a regular [[Beta-prime distribution]] <math> y \sim \beta \, ' \bigg(y; \frac {p}{2}, \frac {\nu}{2} \bigg ) </math> having mean value <math> \frac { \frac{1}{2} p }{ \frac{1}{2}\nu - 1 } = \frac { p }{ \nu - 2 }</math>.

===Cumulative Radial Distribution===

Given the Beta-prime distribution, the radial cumulative distribution function of <math> y</math> is known: <math display="block"> F_Y(y) \sim I \bigg(\frac {y}{1+y}; \, \frac {p}{2}, \frac {\nu}{2} \bigg ) \, B\bigg( \frac {p}{2}, \frac {\nu}{2} \bigg )^{-1} </math>

where <math> I</math> is the incomplete [[Beta function]] and applies with a spherical <math> \Sigma </math> assumption.

In the scalar case, <math> p = 1</math>, the distribution is equivalent to Student-''t'' with the equivalence <math> t^2 = y^2 \sigma^{-1} </math>, the variable ''t'' having double-sided tails for CDF purposes, i.e. the "two-tail-t-test".

The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at <math display="inline"> R = \left(X^\mathsf{T}X\right)^{1/2} </math> with PDF <math display="inline"> p_X(X) \propto \left( 1 + \nu^{-1} R^2 \right)^{-(\nu+p)/2} </math> is an iso-density surface. Given this density value, the quantum of probability on a shell of surface area <math> A_R </math> and thickness <math> \delta R </math> at <math> R </math> is <math> \delta P = p_X(R) \, A_R \delta R </math>.

The enclosed <math> p </math>-sphere of radius <math> R </math> has surface area <math> A_R = \frac { 2\pi^{p/2} R^{ \, p-1 } }{ \Gamma (p/2)} </math>. Substitution into <math> \delta P </math> shows that the shell has element of probability <math> \delta P = p_X(R) \frac { 2\pi^{p/2} R^{ p-1 } }{ \Gamma (p/2)} \delta R </math> which is equivalent to radial density function <math display="block"> f_R(R) = \frac{\Gamma \big ( \frac{1}{2} (\nu + p ) \, \big )}{\nu^{\,p/2} \pi^{\,p/2} \Gamma \big( \frac{1}{2} \nu \big)} \frac{2 \pi^{p/2} R^{ p-1 } }{ \Gamma (p/2)} \bigg( 1 + \frac{ R^2 }{\nu} \bigg)^{-( \nu + p )/2 } </math> which further simplifies to <math> f_R(R) = \frac { 2}{ \nu ^{1/2} B \big( \frac{1}{2} p, \frac{1}{2} \nu \big)} \bigg( \frac {R^2}{ \nu } \bigg)^{ (p-1)/2 } \bigg( 1 + \frac{ R^2 }{\nu} \bigg)^{-( \nu + p )/2 } </math> where <math> B(*,*) </math> is the [[Beta function]].

Changing the radial variable to <math> y=R^2 / \nu </math> returns the previous Beta Prime distribution <math display="block"> f_Y(y) = \frac { 1}{ B{\left( \frac{1}{2} p, \frac{1}{2} \nu \right)}} y^{\, p/2 - 1} \left( 1 + y \right)^{-( \nu + p )/2 } </math>

To scale the radial variables without changing the radial shape function, define scale matrix <math> \Sigma = \alpha \operatorname{I} </math>, yielding a 3-parameter Cartesian density function, ie. the probability <math> \Delta_P </math> in volume element <math> dx_1 \dots dx_p </math> is

<math display="block"> \Delta_P \big (f_X(X \,|\alpha, p, \nu) \big ) = \frac{\Gamma{\left( \frac{1}{2} (\nu + p ) \, \right)}}{ ( \nu \pi)^{\,p/2} \alpha^{\,p/2} \Gamma{\left( \frac{1}{2} \nu \right)}} \left( 1 + \frac{X^\mathsf{T} X}{ \alpha \nu} \right)^{-( \nu + p )/2 } \; dx_1 \dots dx_p </math>

or, in terms of scalar radial variable <math> R </math>,

<math display="block"> f_R(R \,|\alpha, p, \nu) = \frac { 2}{\alpha^{1/2} \; \nu ^{1/2} B \big( \frac{1}{2} p, \frac{1}{2} \nu \big)} \bigg( \frac {R^2}{ \alpha \, \nu } \bigg)^{ (p-1)/2 } \bigg( 1 + \frac{ R^2 }{ \alpha \, \nu} \bigg)^{-( \nu + p )/2 } </math>

=== Radial Moments ===

The moments of all the radial variables , with the spherical distribution assumption, can be derived from the Beta Prime distribution. If <math> Z \sim \beta'(a,b) </math> then <math> \operatorname{E} (Z^m) = {\frac {B(a + m, b - m)}{B(a,b)}} </math>, a known result. Thus, for variable <math> y = \frac {p}{\nu} R^2</math> we have <math display="block"> \operatorname{E} (y^m) = {\frac {B(\frac{1}{2}p + m, \frac{1}{2} \nu - m)}{B( \frac{1}{2} p ,\frac{1}{2} \nu)}} = \frac{\Gamma \big(\frac{1}{2} p + m \big)\; \Gamma \big(\frac{1}{2} \nu - m \big) }{ \Gamma \big( \frac{1}{2} p \big) \; \Gamma \big( \frac{1}{2} \nu \big) }, \; \nu/2 > m </math> The moments of <math> r_2 = \nu \, y </math> are <math display="block"> \operatorname{E} (r_2^m) = \nu^m\operatorname{E} (y^m) </math> while introducing the scale matrix <math> \alpha \operatorname{I} </math> yields <math display="block"> \operatorname{E} (r_2^m | \alpha) = \alpha^m \nu^m \operatorname{E} (y^m) </math> Moments relating to radial variable <math> R </math> are found by setting <math> R =(\alpha\nu y)^{1/2} </math> and <math> M=2m </math> whereupon <math display="block"> \begin{align} \operatorname{E} (R^M) &= \operatorname{E}\!\left((\alpha \nu y)^{1/2} \right)^{2 m } = (\alpha \nu)^{M/2} \operatorname{E} (y^{M/2}) \\[1ex] &= (\alpha \nu)^{M/2} {\frac {B \big(\frac{1}{2} (p + M), \frac{1}{2} (\nu - M) \big )}{B{\left( \frac{p}{2}, \frac{\nu}{2} \right)}}} \end{align} </math>

==Linear Combinations and Affine Transformation==

=== Full Rank Transform ===

This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf: <math> f_X(X) = \frac {\Kappa }{ \left|\Sigma \right|^{1/2} } \left( 1+ \nu^{-1} X^\mathsf{T} \Sigma^{-1} X \right) ^ { -\left(\nu + p \right)/2} </math>, where <math> \Kappa </math> is a constant and <math> \nu </math> is arbitrary but fixed, let <math> \Theta \in \mathbb{R}^{p \times p}</math> be a full-rank matrix and form vector <math> Y = \Theta X </math>. Then, by straightforward change of variables

<math display="block"> f_Y(Y) = \frac {\Kappa }{ \left|\Sigma \right|^{1/2} } \left( 1+ \nu^{-1}Y^\mathsf{T} \Theta^{-\mathsf{T}} \Sigma^{-1} \Theta^{-1} Y \right) ^ { -\left(\nu + p \right)/2} \left| \frac{\partial Y }{\partial X} \right| ^{-1} </math>

The matrix of partial derivatives is <math> \frac{\partial Y_i }{\partial X_j} = \Theta_{i,j} </math> and the Jacobian becomes <math> \left| \frac{\partial Y }{\partial X} \right| = \left| \Theta \right| </math>. Thus <math display="block"> f_Y(Y) = \frac {\Kappa }{ \left|\Sigma \right|^{1/2} \left| \Theta \right| } \left( 1 + \nu^{-1} Y^\mathsf{T} \Theta^{-\mathsf{T}} \Sigma^{-1} \Theta^{-1} Y \right) ^ { -\left(\nu + p \right)/2} </math>

The denominator reduces to <math display="block"> \left|\Sigma \right|^{1/2} \left| \Theta \right| = \left|\Sigma \right|^{1/2} \left| \Theta \right|^{1/2} \left|\Theta^\mathsf{T} \right|^{1/2} = \left| \Theta \Sigma \Theta^\mathsf{T} \right|^{1/2} </math> In full: <math display="block"> f_Y(Y) = \frac { \Gamma\left[(\nu+p) / 2\right] }{ \Gamma(\nu/2) \, (\nu \, \pi)^{\, p /2} \left| \Theta \Sigma \Theta^\mathsf{T} \right|^{1/2} } \left( 1 + \nu^{-1} Y^\mathsf{T} \left( \Theta \Sigma \Theta^\mathsf{T} \right) ^{-1} Y \right) ^ { -\left(\nu + p \right)/2} </math>

which is a regular MV-''t'' distribution.

In general if <math> X \sim t_p ( \mu, \Sigma, \nu ) </math> and <math> \Theta^{p \times p } </math> has full rank <math> p </math> then <math display="block"> \Theta X + c \sim t_p( \Theta \mu +c, \Theta \Sigma \Theta^\mathsf{T}, \nu ) </math>

=== Marginal Distributions ===

This is a special case of the rank-reducing linear transform below. Kotz defines marginal distributions as follows. Partition <math> X \sim t (p, \mu, \Sigma, \nu ) </math> into two subvectors of <math> p_1, p_2 </math> elements: <math display="block"> X_p = \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} \sim t \left ( p_1 + p_2, \mu_p, \Sigma_{p \times p}, \nu \right ) </math>

with <math> p_1 + p_2 = p </math>, means <math> \mu_p = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}</math>, scale matrix <math> \Sigma_{p \times p} = \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} </math>

then <math> X_1 \sim t \left ( p_1, \mu_1, \Sigma_{11}, \nu \right ) </math>, <math> X_2 \sim t \left ( p_2, \mu_2, \Sigma_{ 22}, \nu \right ) </math> such that <math display="block"> f(X_1) = \frac{\Gamma\left[(\nu+p_1)/2\right]}{\Gamma(\nu/2) \, (\nu \,\pi)^ {\, p_1/2}\left|{\boldsymbol\Sigma_{11}}\right|^{1/2}}\left[1+\frac{1}{\nu}({\mathbf X_1}-{\boldsymbol\mu_1})^\mathsf{T}{\boldsymbol\Sigma}_{11}^{-1}({\mathbf X_1}-{\boldsymbol\mu_1})\right]^{-(\nu \,+ \, p_1)/2}</math>

<math display="block"> f(X_2) = \frac{\Gamma\left[(\nu+p_2)/2\right]}{\Gamma(\nu/2) \, (\nu \, \pi)^{\, p_2 /2}\left|{\boldsymbol\Sigma_{22}}\right|^{1/2}}\left[1+\frac{1}{\nu}({\mathbf X_2} - {\boldsymbol\mu_2})^\mathsf{T}{\boldsymbol\Sigma}_{22}^{-1}({\mathbf X_2}-{\boldsymbol\mu_2})\right]^{-(\nu \,+ \, p_2)/2}</math>

If a transformation is constructed in the form <math display="block"> \Theta_{p_1 \times \, p} = \begin{bmatrix} 1 & \cdots & 0 & \cdots & 0 \\ 0 & \ddots & 0 & \cdots & 0 \\ 0 & \cdots & 1 & \cdots & 0 \end{bmatrix} </math>

then vector <math> Y = \Theta X </math>, as discussed below, has the same distribution as the marginal distribution of <math> X_1 </math>.

=== Rank-Reducing Linear Transform === In the linear transform case, if <math> \Theta </math> is a rectangular matrix <math> \Theta \in \mathbb{R}^{m \times p}, m , of rank <math> m </math> the result is dimensionality reduction. Here, Jacobian <math> \left| \Theta \right| </math> is seemingly rectangular but the value <math> \left| \Theta \Sigma \Theta^\mathsf{T} \right|^{1/2} </math> in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken.<ref>{{Cite book |last=Aitken |first=A C -|title=Determinants and Matrices |publisher=Oliver and Boyd |year=1948 |edition=5th |location=Edinburgh |pages=Chapter IV, section 36}}</ref> In general if <math> X \sim t (p, \mu, \Sigma, \nu ) </math> and <math> \Theta^{m \times p } </math> has full rank <math> m </math> then

<math display="block"> Y = \Theta X + c \sim t ( m, \Theta \mu + c, \Theta \Sigma \Theta^\mathsf{T}, \nu ) </math> <math display="block"> f_Y(Y) = \frac{\Gamma\left[(\nu + m)/2\right]}{\Gamma(\nu/2) \, (\nu \,\pi)^{\, m / 2} \left| \Theta \Sigma \Theta^\mathsf{T} \right|^{1/2}}\left[1+\frac{1}{\nu}( Y - c_1 )^\mathsf{T} ( \Theta \Sigma \Theta^\mathsf{T} )^{-1} (Y-c_1) \right]^{-(\nu \,+ \, m)/2}, \; c_1 = \Theta \mu + c</math>

''In extremis'', if ''m'' = 1 and <math> \Theta </math> becomes a row vector, then scalar ''Y'' follows a univariate double-sided Student-t distribution defined by <math> t^2 = Y^2 / \sigma^2 </math> with the same <math> \nu </math> degrees of freedom. Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-''t''.

* During affine transformations of variables with elliptical distributions all vectors must ultimately derive from one initial isotropic spherical vector <math> Z </math> whose elements remain 'entangled' and are not statistically independent. * A vector of independent student-''t'' samples is not consistent with the multivariate ''t'' distribution. * Adding two sample multivariate ''t'' vectors generated with independent Chi-squared samples and different <math> \nu </math> values: <math display="inline">{1}/\sqrt{u_1/\nu_1}, \; \; {1}/\sqrt{u_2/\nu_2}</math> will not produce internally consistent distributions, though they will yield a [[Behrens-Fisher problem]].<ref>{{Cite journal |last1=Giron |first1=Javier |last2=del Castilo |first2=Carmen |date=2010 |title=The multivariate Behrens–Fisher distribution |journal=Journal of Multivariate Analysis |volume=101 |issue=9 |pages=2091–2102 |doi=10.1016/j.jmva.2010.04.008 |doi-access=free }}</ref> * Taleb compares many examples of fat-tail elliptical ''vs'' non-elliptical multivariate distributions

==Related concepts==

* In univariate statistics, the [[Student's t-test|Student's ''t''-test]] makes use of [[Student's t-distribution|Student's ''t''-distribution]] * The elliptical multivariate-''t'' distribution arises spontaneously in linearly constrained least squares solutions involving multivariate normal source data, for example the Markowitz global minimum variance solution in financial portfolio analysis.<ref>{{Cite journal |last1=Okhrin |first1=Y |last2=Schmid |first2=W |date=2006 |title=Distributional Properties of Portfolio Weights |url=https://www.sciencedirect.com/science/article/abs/pii/S0304407605001442 |journal=Journal of Econometrics |volume=134 |pages=235–256|doi=10.1016/j.jeconom.2005.06.022 }}</ref><ref>{{Cite journal |last1=Bodnar |first1=T |last2=Dmytriv |first2=S |last3=Parolya |first3=N |last4=Schmid |first4=W |date=2019 |title=Tests for the Weights of the Global Minimum Variance Portfolio in a High-Dimensional Setting |journal= IEEE Transactions on Signal Processing|volume=67 |issue=17 |pages=4479–4493|doi=10.1109/TSP.2019.2929964 |arxiv=1710.09587 |bibcode=2019ITSP...67.4479B }}</ref><ref name=":2">{{Cite journal |last1=Bodnar |first1=T |last2=Okhrin |first2=Y |date=2008 |title=Properties of the Singular, Inverse and Generalized inverse Partitioned Wishart Distribution. |url=https://core.ac.uk/download/pdf/82469023.pdf |journal=Journal of Multivariate Analysis |volume=99 |issue=Eqn.20 |pages=2389–2405|doi=10.1016/j.jmva.2008.02.024 }}</ref> which addresses an ensemble of normal random vectors or a random matrix. It does not arise in ordinary least squares (OLS) or multiple regression with fixed dependent and independent variables which problem tends to produce well-behaved normal error probabilities. * [[Hotelling's T-squared distribution|Hotelling's ''T''-squared distribution]] is a distribution that arises in multivariate statistics. * The [[matrix t-distribution|matrix ''t''-distribution]] is a distribution for random variables arranged in a matrix structure. {{more footnotes|date=May 2012}}

== See also == * [[Multivariate normal distribution]], which is the limiting case of the multivariate Student's t-distribution when <math>\nu\uparrow\infty</math>. * [[Chi distribution]], the [[probability density function|pdf]] of the scaling factor in the construction the Student's t-distribution and also the [[Norm (mathematics)#p-norm|2-norm]] (or [[Euclidean norm]]) of a multivariate normally distributed vector (centered at zero). ** {{slink|Rayleigh distribution#Student's t}}, random vector length of multivariate ''t''-distribution * [[Mahalanobis distance]]

== References == {{Reflist|refs= }}

==Literature== {{refbegin}} * {{cite book |title= Multivariate ''t'' Distributions and Their Applications |last= Kotz |first= Samuel |author2=Nadarajah, Saralees |year= 2004 |publisher= Cambridge University Press |isbn= 978-0521826549 }} * {{cite book |title= Copula methods in finance |last= Cherubini |first= Umberto |author2=Luciano, Elisa |author3=Vecchiato, Walter |year= 2004 |publisher= John Wiley & Sons |isbn= 978-0470863442 }} * {{Cite book |last=Taleb |first=Nassim Nicholas |title=Statistical Consequences of Fat Tails |publisher=Academic Press |year=2023 |isbn=979-8218248031 |edition=1st}} {{refend}}

==External links== *[https://web.archive.org/web/20061202010900/http://www.mth.kcl.ac.uk/~shaww/web_page/papers/MultiStudentc.pdf Copula Methods vs Canonical Multivariate Distributions: the multivariate Student T distribution with general degrees of freedom] *[http://www.statlect.com/mcdstu1.htm Multivariate Student's ''t'' distribution] {{ProbDistributions|multivariate}}

{{DEFAULTSORT:Multivariate Normal Distribution}} [[Category:Continuous distributions]] [[Category:Multivariate continuous distributions]]