{{short description|In mathematics, a quantitative measure of the shape of a set of points}} {{about||the physical concept|Moment (physics)}} <!-- It could be interesting to comment on whether the moments are invariant over coordinate changes. Usually we use moments to compare functions that are in the same coordinate system, so it doesn't usually matter. -->

'''Moments''' of a [[Function (mathematics)|function]] in [[mathematics]] are certain quantitative measures related to the shape of the function's [[Graph of a function|graph]]. For example, if the function represents mass density, then the zeroth moment is the total mass, the first moment (normalized by total mass) is the [[center of mass]], and the second moment is the [[moment of inertia]]. If the function is a [[probability distribution]], then the first moment is the [[expected value]], the second [[central moment]] is the [[variance]], the third [[standardized moment]] is the [[skewness]], and the fourth standardized moment is the [[kurtosis]].

For a distribution of mass or probability on a [[bounded set|bounded interval]], the collection of all the moments (of all orders, from {{math|0}} to {{math|∞}}) uniquely determines the distribution ([[Hausdorff moment problem]]). The same is not true on unbounded intervals ([[Hamburger moment problem]]).

In the mid-nineteenth century, [[Pafnuty Chebyshev]] became the first person to think systematically in terms of the moments of [[random variables]].<ref>{{cite journal|journal=Bulletin of the American Mathematical Society |series=New Series| volume=3| number=1|date=July 1980|title=HARMONIC ANALYSIS AS THE EXPLOITATION OF SYMMETRY - A HISTORICAL SURVEY| author=George Mackey| page=549}}</ref>

== Significance of the moments == The {{mvar|n}}th raw moment (i.e., moment about zero) of a random variable <math>X</math> with density function <math>f(x)</math> is defined by<ref>{{cite book| last=Papoulis| first=A.| title=Probability, Random Variables, and Stochastic Processes, 2nd ed.|publisher=[[McGraw Hill]]| year=1984| location=New York| pages=145–149}}</ref><math display="block">\mu'_n = \langle X^{n} \rangle ~\overset{\mathrm{def}}{=}~ \begin{cases} \sum_i x^n_i f(x_i), & \text{discrete distribution} \\[1.2ex] \int x^n f(x) \, dx, & \text{continuous distribution} \end{cases}</math>The {{mvar|n}}th moment of a [[real number|real]]-valued continuous random variable with density function <math>f(x)</math> about a value <math>c</math> is the [[integral]]<math display="block">\mu_n = \int_{-\infty}^\infty (x - c)^n\,f(x)\,\mathrm{d}x.</math>

It is possible to define moments for [[random variable]]s in a more general fashion than moments for real-valued functions – see [[#Central moments in metric spaces|moments in metric spaces]]. The moment of a function, without further explanation, usually refers to the above expression with <math>c=0</math>. For the second and higher moments, the [[central moment]] (moments about the mean, with ''c'' being the mean) are usually used rather than the moments about zero, because they provide clearer information about the distribution's shape.

Other moments may also be defined. For example, the {{mvar|n}}th inverse moment about zero is <math>\operatorname{E}\left[X^{-n}\right]</math> and the {{mvar|n}}th logarithmic moment about zero is <math>\operatorname{E}\left[\ln^n(X)\right].</math>

The {{mvar|n}}th moment about zero of a probability density function <math>f(x)</math> is the [[expected value]] of <math>X^n</math> and is called a ''raw moment'' or ''crude moment''.<ref>{{cite web |url=http://mathworld.wolfram.com/RawMoment.html |title=Raw Moment -- from Wolfram MathWorld |access-date=2009-06-24 |url-status=live |archive-url=https://web.archive.org/web/20090528152407/http://mathworld.wolfram.com/RawMoment.html |archive-date=2009-05-28 }} Raw Moments at Math-world</ref> The moments about its mean <math>\mu</math> are called [[central moment|''central'' moments]]; these describe the shape of the function, independently of [[translation (geometry)|translation]].

If <math>f</math> is a [[probability density function]], then the value of the integral above is called the {{mvar|n}}th moment of the [[probability distribution]]. More generally, if ''F'' is a [[cumulative distribution function|cumulative probability distribution function]] of any probability distribution, which may not have a density function, then the {{mvar|n}}th moment of the probability distribution is given by the [[Riemann–Stieltjes integral]]<math display="block">\mu'_n = \operatorname{E} \left[X^n\right] = \int_{-\infty}^\infty x^n\,\mathrm{d}F(x)</math>where ''X'' is a [[random variable]] that has this cumulative distribution ''F'', and {{math|E}} is the [[expectation operator]] or mean. When<math display="block">\operatorname{E}\left[ \left|X^n \right| \right] = \int_{-\infty}^\infty \left|x^n\right|\,\mathrm{d}F(x) = \infty</math>the moment is said not to exist. If the {{mvar|n}}th moment about any point exists, so does the {{math|(''n'' − 1)}}th moment (and thus, all lower-order moments) about every point. The zeroth moment of any [[probability density function]] is {{math|1}}, since the area under any [[probability density function]] must be equal to one.

{|class="wikitable" |+ Significance of moments (raw, central, standardised) and cumulants (raw, normalised), in connection with named properties of distributions |- ! rowspan=2 | Moment <br/>ordinal ! colspan=3 | Moment ! colspan=2 | [[Cumulant]] |- ! Raw ! Central ! Standardized ! Raw ! Normalized |- | 1 || [[Mean]] || 0 || 0 || Mean || {{n/a}} |- | 2 || — || [[Variance]] || 1 || Variance || 1 |- | 3 || — || — || [[Skewness]] || — || Skewness |- | 4 || — || — || (Non-excess or historical) [[kurtosis]] || — || [[Kurtosis#Excess kurtosis|Excess kurtosis]] |- | 5 || — || — || Hyperskewness || — || — |- | 6 || — || — || Hypertailedness || — || — |- | 7+ || — || — || — || — || — |}

=== Standardized moments === {{main|Standardized moment}} The ''normalised'' {{mvar|n}}th central moment or standardised moment is the {{mvar|n}}th central moment divided by {{mvar|σ<sup>n</sup>}}; the normalised {{mvar|n}}th central moment of the random variable {{mvar|X}} is <math display="block">\frac{\mu_n}{\sigma^n} = \frac{\operatorname{E}\left[(X - \mu)^n\right]}{\sigma^n} = \frac{\operatorname{E}\left[(X - \mu)^n\right]}{\operatorname{E}\left[(X - \mu)^2\right]^\frac{n}{2}} .</math>

These normalised central moments are [[dimensionless number|dimensionless quantities]], which represent the distribution independently of any linear change of scale.

=== Notable moments === ==== Mean ==== {{main|Mean}} The first raw moment is the [[mean]], usually denoted <math>\mu \equiv \operatorname{E}[X].</math>

==== Variance ==== {{main|Variance}} The second [[central moment]] is the [[variance]]. The positive [[square root]] of the variance is the [[standard deviation]] <math>\sigma \equiv \left(\operatorname{E}\left[(x - \mu)^2\right]\right)^\frac{1}{2}.</math>

==== Skewness ==== {{main|Skewness}} The third central moment is the measure of the lopsidedness of the distribution; any symmetric distribution will have a third central moment, if defined, of zero. The normalised third central moment is called the [[skewness]], often {{mvar|γ}}. A distribution that is skewed to the left (the tail of the distribution is longer on the left) will have a negative skewness. A distribution that is skewed to the right (the tail of the distribution is longer on the right), will have a positive skewness.

For distributions that are not too different from the [[normal distribution]], the [[median]] will be somewhere near {{math|''μ'' − ''γσ''/6}}; the [[Mode (statistics)|mode]] about {{math|''μ'' − ''γσ''/2}}.

==== Kurtosis ==== {{main|Kurtosis}}

The fourth central moment is a measure of the heaviness of the tail of the distribution. Since it is the expectation of a fourth power, the fourth central moment, where defined, is always nonnegative; and except for a [[degenerate probability distribution|point distribution]], it is always strictly positive. The fourth central moment of a normal distribution is {{math|3''σ''<sup>4</sup>}}.

The [[kurtosis]] {{mvar|κ}} is defined to be the standardized fourth central moment. (Equivalently, as in the next section, excess kurtosis is the fourth [[cumulant]] divided by the square of the second [[cumulant]].)<ref name="CasellaBerger">{{cite book | last1 = Casella | first1 = George | last2 = Berger | first2 = Roger L. | author-link1 = George Casella | author-link2 = Roger Lee Berger | title = Statistical Inference | publisher = Duxbury | location = Pacific Grove | year = 2002 | edition = 2 | isbn = 0-534-24312-6 }}</ref><ref name="BalandaMacGillivray88">{{cite journal | last1 = Ballanda | first1 = Kevin P. | last2 = MacGillivray | first2 = H. L. | author2-link = Helen MacGillivray | title = Kurtosis: A Critical Review | journal = The American Statistician | volume = 42 | issue = 2 | pages = 111–119 | year = 1988 | doi = 10.2307/2684482 | jstor = 2684482 | publisher = American Statistical Association}}</ref> If a distribution has heavy tails, the kurtosis will be high (sometimes called leptokurtic); conversely, light-tailed distributions (for example, bounded distributions such as the uniform) have low kurtosis (sometimes called platykurtic).

The kurtosis can be positive without limit, but {{mvar|κ}} must be greater than or equal to {{math|''γ''<sup>2</sup> + 1}}; equality only holds for [[Bernoulli distribution|binary distributions]]. For unbounded skew distributions not too far from normal, {{mvar|κ}} tends to be somewhere in the area of {{math|''γ''<sup>2</sup>}} and {{math|2''γ''<sup>2</sup>}}.

The inequality can be proven by considering<math display="block">\operatorname{E}\left[\left(T^2 - aT - 1\right)^2\right]</math>where {{math|1=''T'' = (''X'' − ''μ'')/''σ''}}. This is the expectation of a square, so it is non-negative for all ''a''; however it is also a quadratic [[polynomial]] in ''a''. Its [[discriminant]] must be non-positive, which gives the required relationship.

=== Higher moments === '''High-order moments''' are moments beyond 4th-order moments.

As with variance, skewness, and kurtosis, these are [[higher-order statistics]], involving non-linear combinations of the data, and can be used for description or estimation of further [[shape parameter]]s. The higher the moment, the harder it is to estimate, in the sense that larger samples are required in order to obtain estimates of similar quality. This is due to the excess [[Degrees of freedom (statistics)|degrees of freedom]] consumed by the higher orders. Further, they can be subtle to interpret, often being most easily understood in terms of lower order moments – compare the higher-order derivatives of [[Jerk (physics)|jerk]] and [[jounce]] in [[physics]]. For example, just as the 4th-order moment (kurtosis) can be interpreted as "relative importance of tails as compared to shoulders in contribution to dispersion" (for a given amount of dispersion, higher kurtosis corresponds to thicker tails, while lower kurtosis corresponds to broader shoulders), the 5th-order moment can be interpreted as measuring "relative importance of tails as compared to center ([[Mode (statistics)|mode]] and shoulders) in contribution to skewness" (for a given amount of skewness, higher 5th moment corresponds to higher skewness in the tail portions and little skewness of mode, while lower 5th moment corresponds to more skewness in shoulders).

=== Mixed moments === '''Mixed moments''' are moments involving multiple variables.

The value <math>E[X^k]</math> is called the moment of order <math>k</math> (moments are also defined for non-integral <math>k</math>). The moments of the joint distribution of random variables <math>X_1 ... X_n</math> are defined similarly. For any integers <math>k_i\geq0</math>, the mathematical expectation <math>E[{X_1}^{k_1}\cdots{X_n}^{k_n}]</math> is called a mixed moment of order <math>k</math> (where <math>k=k_1+...+k_n</math>), and <math>E[(X_1-E[X_1])^{k_1}\cdots(X_n-E[X_n])^{k_n}]</math> is called a central mixed moment of order <math>k</math>. The mixed moment <math>E[(X_1-E[X_1])(X_2-E[X_2])]</math> is called the covariance and is one of the basic characteristics of dependency between random variables.

Some examples are [[covariance]], [[coskewness]] and [[cokurtosis]]. While there is a unique covariance, there are multiple co-skewnesses and co-kurtoses.

== Properties of moments == === Transformation of center === Since <math display="block">(x - b)^n = (x - a + a - b)^n = \sum_{i=0}^n {n \choose i}(x - a)^i(a - b)^{n-i}</math> where <math display="inline">\binom{n}{i}</math> is the [[binomial coefficient]], it follows that the moments about ''b'' can be calculated from the moments about ''a'' by: <math display="block">E\left[(x - b)^n\right] = \sum_{i=0}^n {n \choose i} E\left[(x - a)^i\right](a - b)^{n-i}.</math>

=== Moment of a convolution of function === {{main|Convolution}} The raw moment of a convolution <math display="inline">h(t) = (f * g)(t) = \int_{-\infty}^\infty f(\tau) g(t - \tau) \, d\tau</math> reads <math display="block">\mu_n[h] = \sum_{i=0}^{n} {n\choose i} \mu_i[f] \mu_{n-i}[g]</math> where <math>\mu_n[\,\cdot\,]</math> denotes the <math>n</math>th moment of the function given in the brackets. This identity follows by the convolution theorem for moment generating function and applying the chain rule for [[differentiation (mathematics)|differentiating]] a product.

== Cumulants == {{main|Cumulant}}

The first raw moment and the second and third ''unnormalized central'' moments are additive in the sense that if ''X'' and ''Y'' are [[statistical independence|independent]] random variables then <math display="block">\begin{align} m_1(X + Y) &= m_1(X) + m_1(Y) \\ \operatorname{Var}(X + Y) &= \operatorname{Var}(X) + \operatorname{Var}(Y) \\ \mu_3(X + Y) &= \mu_3(X) + \mu_3(Y) \end{align}</math>

(These can also hold for variables that satisfy weaker conditions than independence. The first always holds; if the second holds, the variables are called [[correlation|uncorrelated]]).

These are the first three cumulants and all cumulants share this additivity property.

== Sample moments == For all ''k'', the {{mvar|k}}th raw moment of a population can be estimated using the {{mvar|k}}th raw sample moment <math display="block">\frac{1}{n}\sum_{i = 1}^{n} X^k_i</math> applied to a sample {{math|''X''<sub>1</sub>, ..., ''X<sub>n</sub>''}} drawn from the population.

It can be shown that the expected value of the raw sample moment is equal to the {{mvar|k}}th raw moment of the population, if that moment exists, for any sample size {{mvar|n}}. It is thus an unbiased estimator. This contrasts with the situation for central moments, whose computation uses up a degree of freedom by using the sample mean. So for example an unbiased estimate of the population variance (the second central moment) is given by <math display="block">\frac{1}{n - 1}\sum_{i = 1}^n \left(X_i - \bar{X}\right)^2</math> in which the previous denominator {{mvar|n}} has been replaced by the degrees of freedom {{math|''n'' − 1}}, and in which <math>\bar X</math> refers to the sample mean. This estimate of the population moment is greater than the unadjusted observed sample moment by a factor of <math>\tfrac{n}{n-1},</math> and it is referred to as the "adjusted sample variance" or sometimes simply the "sample variance".

== Problem of moments == {{main|Moment problem}} Problems of determining a probability distribution from its sequence of moments are called ''problem of moments''. Such problems were first discussed by P.L. Chebyshev (1874)<ref>Feller, W. (1957-1971). ''An introduction to probability theory and its applications''. New York: John Wiley & Sons. 419 p.</ref> in connection with research on limit theorems. In order that the probability distribution of a random variable <math>X</math> be uniquely defined by its moments <math>\alpha_k = E\left[X^k\right]</math> it is sufficient, for example, that Carleman's condition be satisfied: <math display="block">\sum_{k=1}^\infin\frac{1}{\alpha_{2k}^{1/2k}} = \infin</math> A similar result even holds for moments of random vectors. The ''problem of moments'' seeks characterizations of sequences <math>{{\mu_n}': n = 1,2,3,\dots}</math> that are sequences of moments of some function {{math|''f''}}, all moments <math>\alpha_k(n)</math> of which are finite, and for each integer <math>k\geq1</math> let <math display="block">\alpha_k(n)\rightarrow \alpha_k ,n\rightarrow \infin,</math> where <math>\alpha_k</math> is finite. Then there is a sequence <math>{\mu_n}'</math> that weakly converges to a distribution function <math>\mu</math> having <math>\alpha_k</math> as its moments. If the moments determine <math>\mu</math> uniquely, then the sequence <math>{\mu_n}'</math> weakly converges to <math>\mu</math>.

== Partial moments == Partial moments are sometimes referred to as "one-sided moments". The {{mvar|n}}th order lower and upper partial moments with respect to a reference point ''r'' may be expressed as <math display="block">\mu_n^- (r) = \int_{-\infty}^r (r - x)^n\,f(x)\,\mathrm{d}x,</math> <math display="block">\mu_n^+ (r) = \int_r^\infty (x - r)^n\,f(x)\,\mathrm{d}x.</math>

If the integral function does not converge, the partial moment does not exist.

Partial moments are normalized by being raised to the power {{math|1/''n''}}. The [[upside potential ratio]] may be expressed as a ratio of a first-order upper partial moment to a normalized second-order lower partial moment.

== Central moments in metric spaces == Let {{math|(''M'', ''d'')}} be a [[metric space]], and let {{math|B(''M'')}} be the [[Borel sigma algebra|Borel {{mvar|σ}}-algebra]] on {{math|''M''}}, the [[sigma algebra|{{mvar|σ}}-algebra]] generated by the {{math|''d''}}-[[open set|open subsets]] of {{math|''M''}}. (For technical reasons, it is also convenient to assume that {{math|''M''}} is a [[separable space]] with respect to the [[metric (mathematics)|metric]] {{math|''d''}}.) Let {{math|1 ≤ ''p'' ≤ ∞}}.

The '''{{mvar|p}}th central moment''' of a measure {{mvar|μ}} on the [[measurable space]] {{math|(''M'', B(''M''))}} about a given point {{math|''x''<sub>0</sub> ∈ ''M''}} is defined to be <math display="block">\int_{M} d\left(x, x_0\right)^p \, \mathrm{d} \mu (x).</math>

{{math|''μ''}} is said to have '''finite {{mvar|p}}th central moment''' if the {{mvar|p}}th central moment of {{mvar|μ}} about {{math|''x''<sub>0</sub>}} is finite for some {{math|''x''<sub>0</sub> ∈ ''M''}}.

This terminology for measures carries over to random variables in the usual way: if {{math|(Ω, Σ, '''P''')}} is a [[probability space]] and {{math|''X'' : Ω → ''M''}} is a random variable, then the '''{{mvar|p}}th central moment''' of {{math|''X''}} about {{math|''x''<sub>0</sub> ∈ ''M''}} is defined to be <math display="block"> \int_M d \left(x, x_0\right)^p \, \mathrm{d} \left( X_* \left(\mathbf{P}\right) \right) (x) = \int_\Omega d \left(X(\omega), x_0\right)^p \, \mathrm{d} \mathbf{P} (\omega) = \operatorname{\mathbf{E}}[d(X, x_0)^p],</math> and ''X'' has '''finite {{mvar|p}}th central moment''' if the {{mvar|p}}th central moment of {{math|''X''}} about {{math|''x''<sub>0</sub>}} is finite for some {{math|''x''<sub>0</sub> ∈ ''M''}}.

== See also == {{div col|colwidth=20em}} * [[Energy (signal processing)]] * [[Factorial moment]] * [[Generalised mean]] * [[Image moment]] * [[L-moment]] * [[Method of moments (probability theory)]] * [[Method of moments (statistics)]] * [[Moment-generating function#Calculations of moments|Moment-generating function]] * [[Moment measure]] * [[Second moment method]] * [[Standardized moment|Standardised moment]] * [[Stieltjes moment problem]] * [[Taylor expansions for the moments of functions of random variables]] {{div col end}}

== References == * [[File:CC BY-SA icon.svg|50px]] Text was copied from [https://encyclopediaofmath.org/wiki/Moment Moment] at the Encyclopedia of Mathematics, which is released under a [https://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 (Unported) (CC-BY-SA 3.0) license] and the [[Wikipedia:Text of the GNU Free Documentation License|GNU Free Documentation License]]. {{Reflist}}

== Further reading == * {{cite book |last=Spanos |first=Aris |pages=[https://archive.org/details/probabilitytheor00span_893/page/n135 109]–130 |title=Probability Theory and Statistical Inference |url=https://archive.org/details/probabilitytheor00span_893 |url-access=limited |location=New York |publisher=Cambridge University Press |year=1999 |isbn=0-521-42408-9 }} * {{cite book |last1=Walker |first1=Helen M. |author-link=Helen M. Walker |title=Studies in the history of statistical method, with special reference to certain educational problems |page=[https://archive.org/details/studiesinhistory00walk/page/71 71] |date=1929 |publisher=Baltimore, Williams & Wilkins Co. |url=https://archive.org/details/studiesinhistory00walk/page/71 }}

== External links == * {{springer|title=Moment|id=p/m064580}} * [http://mathworld.wolfram.com/topics/Moments.html Moments at Mathworld]

{{theory of probability distributions}} {{statistics|descriptive}}

{{DEFAULTSORT:Moment (Mathematics)}} [[Category:Moments (mathematics)| ]] [[Category:Moment (physics)]]