# Variance function

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Variance_function
> Markdown URL: https://mediated.wiki/source/Variance_function.md
> Source: https://en.wikipedia.org/wiki/Variance_function
> Source revision: 1175406517
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Smooth function in statistics

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Variance function" – news · newspapers · books · scholar · JSTOR (March 2014) (Learn how and when to remove this message)

For variance as a function of space-time separation, see [Variogram](/source/Variogram).

Part of a series on Regression analysis Models Linear regression Simple regression Polynomial regression General linear model Generalized linear model Vector generalized linear model Discrete choice Binomial regression Binary regression Logistic regression Multinomial logistic regression Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson Multilevel model Fixed effects Random effects Linear mixed-effects model Nonlinear mixed-effects model Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented Errors-in-variables Estimation Least squares Linear Non-linear Ordinary Weighted Generalized Generalized estimating equation Partial Total Non-negative Ridge regression Regularized Least absolute deviations Iteratively reweighted Bayesian Bayesian multivariate Least-squares spectral analysis Background Regression validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem Mathematics portal v t e

In [statistics](/source/Statistics), the **variance function** is a [smooth function](/source/Smooth_function) that depicts the [variance](/source/Variance) of a [random quantity](/source/Random_quantity) as a function of its [mean](/source/Mean). The variance function is a measure of [heteroscedasticity](/source/Heteroscedasticity) and plays a large role in many settings of statistical modelling. It is a main ingredient in the [generalized linear model](/source/Generalized_linear_model) framework and a tool used in [non-parametric regression](/source/Non-parametric_regression),[1] [semiparametric regression](/source/Semiparametric_regression)[1] and [functional data analysis](/source/Functional_data_analysis).[2] In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a [smooth function](/source/Smooth_function).

## Intuition

In a regression model setting, the goal is to establish whether or not a relationship exists between a response variable and a set of predictor variables. Further, if a relationship does exist, the goal is then to be able to describe this relationship as best as possible. A main assumption in [linear regression](/source/Linear_regression) is constant variance or (homoscedasticity), meaning that different response variables have the same variance in their errors, at every predictor level. This assumption works well when the response variable and the predictor variable are jointly [normal](/source/Normal_distribution). As we will see later, the variance function in the Normal setting is constant; however, we must find a way to quantify heteroscedasticity (non-constant variance) in the absence of joint Normality.

When it is likely that the response follows a distribution that is a member of the exponential family, a [generalized linear model](/source/Generalized_linear_model) may be more appropriate to use, and moreover, when we wish not to force a parametric model onto our data, a [non-parametric regression](/source/Non-parametric_regression) approach can be useful. The importance of being able to model the variance as a function of the mean lies in improved inference (in a parametric setting), and estimation of the regression function in general, for any setting.

Variance functions play a very important role in parameter estimation and inference. In general, maximum likelihood estimation requires that a likelihood function be defined. This requirement then implies that one must first specify the distribution of the response variables observed. However, to define a quasi-likelihood, one need only specify a relationship between the mean and the variance of the observations to then be able to use the quasi-likelihood function for estimation.[3] [Quasi-likelihood](/source/Quasi-likelihood) estimation is particularly useful when there is [overdispersion](/source/Overdispersion). Overdispersion occurs when there is more variability in the data than there should otherwise be expected according to the assumed distribution of the data.

In summary, to ensure efficient inference of the regression parameters and the regression function, the heteroscedasticity must be accounted for. Variance functions quantify the relationship between the variance and the mean of the observed data and hence play a significant role in regression estimation and inference.

## Types

The variance function and its applications come up in many areas of statistical analysis. A very important use of this function is in the framework of [generalized linear models](/source/Generalized_linear_models) and [non-parametric regression](/source/Non-parametric_regression).

### Generalized linear model

When a member of the [exponential family](/source/Exponential_family) has been specified, the variance function can easily be derived.[4]: 29 The general form of the variance function is presented under the exponential family context, as well as specific forms for Normal, Bernoulli, Poisson, and Gamma. In addition, we describe the applications and use of variance functions in maximum likelihood estimation and quasi-likelihood estimation.

#### Derivation

The **generalized linear model (GLM)**, is a generalization of ordinary regression analysis that extends to any member of the [exponential family](/source/Exponential_family). It is particularly useful when the response variable is categorical, binary or subject to a constraint (e.g. only positive responses make sense). A quick summary of the components of a GLM are summarized on this page, but for more details and information see the page on [generalized linear models](/source/Generalized_linear_models).

A **GLM** consists of three main ingredients:

- 1. Random Component: a distribution of **y** from the exponential family, E [ y ∣ X ] = μ {\displaystyle E[y\mid X]=\mu }

- 2. Linear predictor: η = X B = ∑ j = 1 p X i j T B j {\displaystyle \eta =XB=\sum _{j=1}^{p}X_{ij}^{T}B_{j}}

- 3. Link function: η = g ( μ ) , μ = g − 1 ( η ) {\displaystyle \eta =g(\mu ),\mu =g^{-1}(\eta )}

First it is important to derive a couple key properties of the exponential family.

Any random variable y {\displaystyle {\textit {y}}} in the exponential family has a probability density function of the form,

- f ( y , θ , ϕ ) = exp ⁡ ( y θ − b ( θ ) ϕ − c ( y , ϕ ) ) {\displaystyle f(y,\theta ,\phi )=\exp \left({\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right)}

with loglikelihood,

- ℓ ( θ , y , ϕ ) = log ⁡ ( f ( y , θ , ϕ ) ) = y θ − b ( θ ) ϕ − c ( y , ϕ ) {\displaystyle \ell (\theta ,y,\phi )=\log(f(y,\theta ,\phi ))={\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )}

Here, θ {\displaystyle \theta } is the canonical parameter and the parameter of interest, and ϕ {\displaystyle \phi } is a nuisance parameter which plays a role in the variance. We use the **Bartlett's Identities** to derive a general expression for the **variance function**. The first and second Bartlett results ensures that under suitable conditions (see [Leibniz integral rule](/source/Leibniz_integral_rule)), for a density function dependent on θ , f θ ( ) {\displaystyle \theta ,f_{\theta }()} ,

- E θ ⁡ [ ∂ ∂ θ log ⁡ ( f θ ( y ) ) ] = 0 {\displaystyle \operatorname {E} _{\theta }\left[{\frac {\partial }{\partial \theta }}\log(f_{\theta }(y))\right]=0}

- Var θ ⁡ [ ∂ ∂ θ log ⁡ ( f θ ( y ) ) ] + E θ ⁡ [ ∂ 2 ∂ θ 2 log ⁡ ( f θ ( y ) ) ] = 0 {\displaystyle \operatorname {Var} _{\theta }\left[{\frac {\partial }{\partial \theta }}\log(f_{\theta }(y))\right]+\operatorname {E} _{\theta }\left[{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log(f_{\theta }(y))\right]=0}

These identities lead to simple calculations of the expected value and variance of any random variable y {\displaystyle {\textit {y}}} in the exponential family E θ [ y ] , V a r θ [ y ] {\displaystyle E_{\theta }[y],Var_{\theta }[y]} .

**Expected value of *Y*:** Taking the first derivative with respect to θ {\displaystyle \theta } of the log of the density in the exponential family form described above, we have

- ∂ ∂ θ log ⁡ ( f ( y , θ , ϕ ) ) = ∂ ∂ θ [ y θ − b ( θ ) ϕ − c ( y , ϕ ) ] = y − b ′ ( θ ) ϕ {\displaystyle {\frac {\partial }{\partial \theta }}\log(f(y,\theta ,\phi ))={\frac {\partial }{\partial \theta }}\left[{\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right]={\frac {y-b'(\theta )}{\phi }}}

Then taking the expected value and setting it equal to zero leads to,

- E θ ⁡ [ y − b ′ ( θ ) ϕ ] = E θ ⁡ [ y ] − b ′ ( θ ) ϕ = 0 {\displaystyle \operatorname {E} _{\theta }\left[{\frac {y-b'(\theta )}{\phi }}\right]={\frac {\operatorname {E} _{\theta }[y]-b'(\theta )}{\phi }}=0}

- E θ ⁡ [ y ] = b ′ ( θ ) {\displaystyle \operatorname {E} _{\theta }[y]=b'(\theta )}

**Variance of Y:** To compute the variance we use the second Bartlett identity,

- Var θ ⁡ [ ∂ ∂ θ ( y θ − b ( θ ) ϕ − c ( y , ϕ ) ) ] + E θ ⁡ [ ∂ 2 ∂ θ 2 ( y θ − b ( θ ) ϕ − c ( y , ϕ ) ) ] = 0 {\displaystyle \operatorname {Var} _{\theta }\left[{\frac {\partial }{\partial \theta }}\left({\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right)\right]+\operatorname {E} _{\theta }\left[{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left({\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right)\right]=0}

- Var θ ⁡ [ y − b ′ ( θ ) ϕ ] + E θ ⁡ [ − b ″ ( θ ) ϕ ] = 0 {\displaystyle \operatorname {Var} _{\theta }\left[{\frac {y-b'(\theta )}{\phi }}\right]+\operatorname {E} _{\theta }\left[{\frac {-b''(\theta )}{\phi }}\right]=0}

- Var θ ⁡ [ y ] = b ″ ( θ ) ϕ {\displaystyle \operatorname {Var} _{\theta }\left[y\right]=b''(\theta )\phi }

We have now a relationship between μ {\displaystyle \mu } and θ {\displaystyle \theta } , namely

- μ = b ′ ( θ ) {\displaystyle \mu =b'(\theta )} and θ = b ′ − 1 ( μ ) {\displaystyle \theta =b'^{-1}(\mu )} , which allows for a relationship between μ {\displaystyle \mu } and the variance,

- V ( θ ) = b ″ ( θ ) = the part of the variance that depends on θ {\displaystyle V(\theta )=b''(\theta )={\text{the part of the variance that depends on }}\theta }

- V ⁡ ( μ ) = b ″ ( b ′ − 1 ( μ ) ) . {\displaystyle \operatorname {V} (\mu )=b''(b'^{-1}(\mu )).\,}

Note that because Var θ ⁡ [ y ] > 0 , b ″ ( θ ) > 0 {\displaystyle \operatorname {Var} _{\theta }\left[y\right]>0,b''(\theta )>0} , then b ′ : θ → μ {\displaystyle b':\theta \rightarrow \mu } is invertible. We derive the variance function for a few common distributions.

#### Example – normal

The [normal distribution](/source/Normal_distribution) is a special case where the variance function is a constant. Let y ∼ N ( μ , σ 2 ) {\displaystyle y\sim N(\mu ,\sigma ^{2})} then we put the density function of **y** in the form of the exponential family described above:

- f ( y ) = exp ⁡ ( y μ − μ 2 2 σ 2 − y 2 2 σ 2 − 1 2 ln ⁡ 2 π σ 2 ) {\displaystyle f(y)=\exp \left({\frac {y\mu -{\frac {\mu ^{2}}{2}}}{\sigma ^{2}}}-{\frac {y^{2}}{2\sigma ^{2}}}-{\frac {1}{2}}\ln {2\pi \sigma ^{2}}\right)}

where

- θ = μ , {\displaystyle \theta =\mu ,}

- b ( θ ) = μ 2 2 , {\displaystyle b(\theta )={\frac {\mu ^{2}}{2}},}

- ϕ = σ 2 , {\displaystyle \phi =\sigma ^{2},}

- c ( y , ϕ ) = − y 2 2 σ 2 − 1 2 ln ⁡ 2 π σ 2 {\displaystyle c(y,\phi )=-{\frac {y^{2}}{2\sigma ^{2}}}-{\frac {1}{2}}\ln {2\pi \sigma ^{2}}}

To calculate the variance function V ( μ ) {\displaystyle V(\mu )} , we first express θ {\displaystyle \theta } as a function of μ {\displaystyle \mu } . Then we transform V ( θ ) {\displaystyle V(\theta )} into a function of μ {\displaystyle \mu }

- θ = μ {\displaystyle \theta =\mu }

- b ′ ( θ ) = θ = E ⁡ [ y ] = μ {\displaystyle b'(\theta )=\theta =\operatorname {E} [y]=\mu }

- V ( θ ) = b ″ ( θ ) = 1 {\displaystyle V(\theta )=b''(\theta )=1}

Therefore, the variance function is constant.

#### Example – Bernoulli

Let y ∼ Bernoulli ( p ) {\displaystyle y\sim {\text{Bernoulli}}(p)} , then we express the density of the [Bernoulli distribution](/source/Bernoulli_distribution) in exponential family form,

- f ( y ) = exp ⁡ ( y ln ⁡ p 1 − p + ln ⁡ ( 1 − p ) ) {\displaystyle f(y)=\exp \left(y\ln {\frac {p}{1-p}}+\ln(1-p)\right)}

- θ = ln ⁡ p 1 − p = {\displaystyle \theta =\ln {\frac {p}{1-p}}=} [logit](/source/Logit)(p), which gives us p = e θ 1 + e θ = {\displaystyle p={\frac {e^{\theta }}{1+e^{\theta }}}=} [expit](/source/Expit) ( θ ) {\displaystyle (\theta )}

- b ( θ ) = ln ⁡ ( 1 + e θ ) {\displaystyle b(\theta )=\ln(1+e^{\theta })} and

- b ′ ( θ ) = e θ 1 + e θ = {\displaystyle b'(\theta )={\frac {e^{\theta }}{1+e^{\theta }}}=} [expit](/source/Expit) ( θ ) = p = μ {\displaystyle (\theta )=p=\mu }

- b ″ ( θ ) = e θ 1 + e θ − ( e θ 1 + e θ ) 2 {\displaystyle b''(\theta )={\frac {e^{\theta }}{1+e^{\theta }}}-\left({\frac {e^{\theta }}{1+e^{\theta }}}\right)^{2}}

This give us

- V ( μ ) = μ ( 1 − μ ) {\displaystyle V(\mu )=\mu (1-\mu )}

#### Example – Poisson

Let y ∼ Poisson ( λ ) {\displaystyle y\sim {\text{Poisson}}(\lambda )} , then we express the density of the [Poisson distribution](/source/Poisson_distribution) in exponential family form,

- f ( y ) = exp ⁡ ( y ln ⁡ λ − ln ⁡ λ ) {\displaystyle f(y)=\exp(y\ln \lambda -\ln \lambda )}

- θ = ln ⁡ λ = {\displaystyle \theta =\ln \lambda =} which gives us λ = e θ {\displaystyle \lambda =e^{\theta }}

- b ( θ ) = e θ {\displaystyle b(\theta )=e^{\theta }} and

- b ′ ( θ ) = e θ = λ = μ {\displaystyle b'(\theta )=e^{\theta }=\lambda =\mu }

- b ″ ( θ ) = e θ = μ {\displaystyle b''(\theta )=e^{\theta }=\mu }

This give us

- V ( μ ) = μ {\displaystyle V(\mu )=\mu }

Here we see the central property of Poisson data, that the variance is equal to the mean.

#### Example – Gamma

The [Gamma distribution](/source/Gamma_distribution) and density function can be expressed under different parametrizations. We will use the form of the gamma with parameters ( μ , ν ) {\displaystyle (\mu ,\nu )}

- f μ , ν ( y ) = 1 Γ ( ν ) y ( ν y μ ) ν e − ν y μ {\displaystyle f_{\mu ,\nu }(y)={\frac {1}{\Gamma (\nu )y}}\left({\frac {\nu y}{\mu }}\right)^{\nu }e^{-{\frac {\nu y}{\mu }}}}

Then in exponential family form we have

- f μ , ν ( y ) = exp ⁡ ( − 1 μ y + ln ⁡ ( 1 μ ) 1 ν + ln ⁡ ( ν ν y ν − 1 Γ ( ν ) ) ) {\displaystyle f_{\mu ,\nu }(y)=\exp \left({\frac {-{\frac {1}{\mu }}y+\ln({\frac {1}{\mu }})}{\frac {1}{\nu }}}+\ln \left({\frac {\nu ^{\nu }y^{\nu -1}}{\Gamma (\nu )}}\right)\right)}

- θ = − 1 μ → μ = − 1 θ {\displaystyle \theta ={\frac {-1}{\mu }}\rightarrow \mu ={\frac {-1}{\theta }}}

- ϕ = 1 ν {\displaystyle \phi ={\frac {1}{\nu }}}

- b ( θ ) = − ln ⁡ ( − θ ) {\displaystyle b(\theta )=-\ln(-\theta )}

- b ′ ( θ ) = − 1 θ = − 1 − 1 μ = μ {\displaystyle b'(\theta )={\frac {-1}{\theta }}={\frac {-1}{\frac {-1}{\mu }}}=\mu }

- b ″ ( θ ) = 1 θ 2 = μ 2 {\displaystyle b''(\theta )={\frac {1}{\theta ^{2}}}=\mu ^{2}}

And we have V ( μ ) = μ 2 {\displaystyle V(\mu )=\mu ^{2}}

#### Application – weighted least squares

A very important application of the variance function is its use in parameter estimation and inference when the response variable is of the required exponential family form as well as in some cases when it is not (which we will discuss in [quasi-likelihood](/source/Quasi-likelihood)). Weighted [least squares](/source/Least_squares) (WLS) is a special case of generalized least squares. Each term in the WLS criterion includes a weight that determines that the influence each observation has on the final parameter estimates. As in regular least squares, the goal is to estimate the unknown parameters in the regression function by finding values for parameter estimates that minimize the sum of the squared deviations between the observed responses and the functional portion of the model.

While WLS assumes independence of observations it does not assume equal variance and is therefore a solution for parameter estimation in the presence of heteroscedasticity. The [Gauss–Markov theorem](/source/Gauss%E2%80%93Markov_theorem) and [Aitken](/source/Alexander_Aitken) demonstrate that the [best linear unbiased estimator](/source/Best_linear_unbiased_estimator) (BLUE), the unbiased estimator with minimum variance, has each weight equal to the reciprocal of the variance of the measurement.

In the GLM framework, our goal is to estimate parameters β {\displaystyle \beta } , where Z = g ( E [ y ∣ X ] ) = X β {\displaystyle Z=g(E[y\mid X])=X\beta } . Therefore, we would like to minimize ( Z − X B ) T W ( Z − X B ) {\displaystyle (Z-XB)^{T}W(Z-XB)} and if we define the weight matrix **W** as

- W ⏟ n × n = [ 1 ϕ V ( μ 1 ) g ′ ( μ 1 ) 2 0 ⋯ 0 0 0 1 ϕ V ( μ 2 ) g ′ ( μ 2 ) 2 0 ⋯ 0 ⋮ ⋮ ⋮ ⋮ 0 ⋮ ⋮ ⋮ ⋮ 0 0 ⋯ ⋯ 0 1 ϕ V ( μ n ) g ′ ( μ n ) 2 ] , {\displaystyle \underbrace {W} _{n\times n}={\begin{bmatrix}{\frac {1}{\phi V(\mu _{1})g'(\mu _{1})^{2}}}&0&\cdots &0&0\\0&{\frac {1}{\phi V(\mu _{2})g'(\mu _{2})^{2}}}&0&\cdots &0\\\vdots &\vdots &\vdots &\vdots &0\\\vdots &\vdots &\vdots &\vdots &0\\0&\cdots &\cdots &0&{\frac {1}{\phi V(\mu _{n})g'(\mu _{n})^{2}}}\end{bmatrix}},}

where ϕ , V ( μ ) , g ( μ ) {\displaystyle \phi ,V(\mu ),g(\mu )} are defined in the previous section, it allows for [iteratively reweighted least squares](/source/Iteratively_reweighted_least_squares) (IRLS) estimation of the parameters. See the section on [iteratively reweighted least squares](/source/Iteratively_reweighted_least_squares) for more derivation and information.

Also, important to note is that when the weight matrix is of the form described here, minimizing the expression ( Z − X B ) T W ( Z − X B ) {\displaystyle (Z-XB)^{T}W(Z-XB)} also minimizes the Pearson distance. See [Distance correlation](/source/Distance_correlation) for more.

The matrix **W** falls right out of the estimating equations for estimation of β {\displaystyle \beta } . Maximum likelihood estimation for each parameter β r , 1 ≤ r ≤ p {\displaystyle \beta _{r},1\leq r\leq p} , requires

- ∑ i = 1 n ∂ l i ∂ β r = 0 {\displaystyle \sum _{i=1}^{n}{\frac {\partial l_{i}}{\partial \beta _{r}}}=0} , where l ⁡ ( θ , y , ϕ ) = log ⁡ ( f ⁡ ( y , θ , ϕ ) ) = y θ − b ( θ ) ϕ − c ( y , ϕ ) {\displaystyle \operatorname {l} (\theta ,y,\phi )=\log(\operatorname {f} (y,\theta ,\phi ))={\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )} is the log-likelihood.

Looking at a single observation we have,

- ∂ l ∂ β r = ∂ l ∂ θ ∂ θ ∂ μ ∂ μ ∂ η ∂ η ∂ β r {\displaystyle {\frac {\partial l}{\partial \beta _{r}}}={\frac {\partial l}{\partial \theta }}{\frac {\partial \theta }{\partial \mu }}{\frac {\partial \mu }{\partial \eta }}{\frac {\partial \eta }{\partial \beta _{r}}}}

- ∂ η ∂ β r = x r {\displaystyle {\frac {\partial \eta }{\partial \beta _{r}}}=x_{r}}

- ∂ l ∂ θ = y − b ′ ( θ ) ϕ = y − μ ϕ {\displaystyle {\frac {\partial l}{\partial \theta }}={\frac {y-b'(\theta )}{\phi }}={\frac {y-\mu }{\phi }}}

- ∂ θ ∂ μ = ∂ b ′ − 1 ( μ ) μ = 1 b ″ ( b ′ ( μ ) ) = 1 V ( μ ) {\displaystyle {\frac {\partial \theta }{\partial \mu }}={\frac {\partial b'^{-1}(\mu )}{\mu }}={\frac {1}{b''(b'(\mu ))}}={\frac {1}{V(\mu )}}}

This gives us

- ∂ l ∂ β r = y − μ ϕ V ( μ ) ∂ μ ∂ η x r {\displaystyle {\frac {\partial l}{\partial \beta _{r}}}={\frac {y-\mu }{\phi V(\mu )}}{\frac {\partial \mu }{\partial \eta }}x_{r}} , and noting that

- ∂ η ∂ μ = g ′ ( μ ) {\displaystyle {\frac {\partial \eta }{\partial \mu }}=g'(\mu )} we have that

- ∂ l ∂ β r = ( y − μ ) W ∂ η ∂ μ x r {\displaystyle {\frac {\partial l}{\partial \beta _{r}}}=(y-\mu )W{\frac {\partial \eta }{\partial \mu }}x_{r}}

The Hessian matrix is determined in a similar manner and can be shown to be,

- H = X T ( y − μ ) [ ∂ β s W ∂ β r ] − X T W X {\displaystyle H=X^{T}(y-\mu )\left[{\frac {\partial }{\beta _{s}}}W{\frac {\partial }{\beta _{r}}}\right]-X^{T}WX}

Noticing that the Fisher Information (FI),

- FI = − E [ H ] = X T W X {\displaystyle {\text{FI}}=-E[H]=X^{T}WX} , allows for asymptotic approximation of β ^ {\displaystyle {\hat {\beta }}}

- β ^ ∼ N p ( β , ( X T W X ) − 1 ) {\displaystyle {\hat {\beta }}\sim N_{p}(\beta ,(X^{T}WX)^{-1})} , and hence inference can be performed.

#### Application – quasi-likelihood

Because most features of **GLMs** only depend on the first two moments of the distribution, rather than the entire distribution, the quasi-likelihood can be developed by just specifying a link function and a variance function. That is, we need to specify

- the link function, E [ y ] = μ = g − 1 ( η ) {\displaystyle E[y]=\mu =g^{-1}(\eta )}

- the variance function, V ( μ ) {\displaystyle V(\mu )} , where the Var θ ⁡ ( y ) = σ 2 V ( μ ) {\displaystyle \operatorname {Var} _{\theta }(y)=\sigma ^{2}V(\mu )}

With a specified variance function and link function we can develop, as alternatives to the log-[likelihood function](/source/Likelihood_function), the [score function](/source/Score_(statistics)), and the [Fisher information](/source/Fisher_information), a **[quasi-likelihood](/source/Quasi-likelihood)**, a **quasi-score**, and the **quasi-information**. This allows for full inference of β {\displaystyle \beta } .

**Quasi-likelihood (QL)**

Though called a [quasi-likelihood](/source/Quasi-likelihood), this is in fact a quasi-**log**-likelihood. The QL for one observation is

- Q i ( μ i , y i ) = ∫ y i μ i y i − t σ 2 V ( t ) d t {\displaystyle Q_{i}(\mu _{i},y_{i})=\int _{y_{i}}^{\mu _{i}}{\frac {y_{i}-t}{\sigma ^{2}V(t)}}\,dt}

And therefore the QL for all **n** observations is

- Q ( μ , y ) = ∑ i = 1 n Q i ( μ i , y i ) = ∑ i = 1 n ∫ y i μ i y − t σ 2 V ( t ) d t {\displaystyle Q(\mu ,y)=\sum _{i=1}^{n}Q_{i}(\mu _{i},y_{i})=\sum _{i=1}^{n}\int _{y_{i}}^{\mu _{i}}{\frac {y-t}{\sigma ^{2}V(t)}}\,dt}

From the **QL** we have the **quasi-score**

**Quasi-score (QS)**

Recall the [score function](/source/Score_(statistics)), **U**, for data with log-likelihood l ⁡ ( μ ∣ y ) {\displaystyle \operatorname {l} (\mu \mid y)} is

- U = ∂ l d μ . {\displaystyle U={\frac {\partial l}{d\mu }}.}

We obtain the quasi-score in an identical manner,

- U = y − μ σ 2 V ( μ ) {\displaystyle U={\frac {y-\mu }{\sigma ^{2}V(\mu )}}}

Noting that, for one observation the score is

- ∂ Q ∂ μ = y − μ σ 2 V ( μ ) {\displaystyle {\frac {\partial Q}{\partial \mu }}={\frac {y-\mu }{\sigma ^{2}V(\mu )}}}

The first two Bartlett equations are satisfied for the quasi-score, namely

- E [ U ] = 0 {\displaystyle E[U]=0}

and

- Cov ⁡ ( U ) + E [ ∂ U ∂ μ ] = 0. {\displaystyle \operatorname {Cov} (U)+E\left[{\frac {\partial U}{\partial \mu }}\right]=0.}

In addition, the quasi-score is linear in **y**.

Ultimately the goal is to find information about the parameters of interest β {\displaystyle \beta } . Both the QS and the QL are actually functions of β {\displaystyle \beta } . Recall, μ = g − 1 ( η ) {\displaystyle \mu =g^{-1}(\eta )} , and η = X β {\displaystyle \eta =X\beta } , therefore,

- μ = g − 1 ( X β ) . {\displaystyle \mu =g^{-1}(X\beta ).}

**Quasi-information (QI)**

The **quasi-information**, is similar to the [Fisher information](/source/Fisher_information),

- i b = − E ⁡ [ ∂ U ∂ β ] {\displaystyle i_{b}=-\operatorname {E} \left[{\frac {\partial U}{\partial \beta }}\right]}

**QL, QS, QI as functions of β {\displaystyle \beta }**

The QL, QS and QI all provide the building blocks for inference about the parameters of interest and therefore it is important to express the QL, QS and QI all as functions of β {\displaystyle \beta } .

Recalling again that μ = g − 1 ( X β ) {\displaystyle \mu =g^{-1}(X\beta )} , we derive the expressions for QL, QS and QI parametrized under β {\displaystyle \beta } .

Quasi-likelihood in β {\displaystyle \beta } ,

- Q ( β , y ) = ∫ y μ ( β ) y − t σ 2 V ( t ) d t {\displaystyle Q(\beta ,y)=\int _{y}^{\mu (\beta )}{\frac {y-t}{\sigma ^{2}V(t)}}\,dt}

The QS as a function of β {\displaystyle \beta } is therefore

- U j ( β j ) = ∂ ∂ β j Q ( β , y ) = ∑ i = 1 n ∂ μ i ∂ β j y i − μ i ( β j ) σ 2 V ( μ i ) {\displaystyle U_{j}(\beta _{j})={\frac {\partial }{\partial \beta _{j}}}Q(\beta ,y)=\sum _{i=1}^{n}{\frac {\partial \mu _{i}}{\partial \beta _{j}}}{\frac {y_{i}-\mu _{i}(\beta _{j})}{\sigma ^{2}V(\mu _{i})}}}

- U ( β ) = [ U 1 ( β ) U 2 ( β ) ⋮ ⋮ U p ( β ) ] = D T V − 1 ( y − μ ) σ 2 {\displaystyle U(\beta )={\begin{bmatrix}U_{1}(\beta )\\U_{2}(\beta )\\\vdots \\\vdots \\U_{p}(\beta )\end{bmatrix}}=D^{T}V^{-1}{\frac {(y-\mu )}{\sigma ^{2}}}}

Where,

- D ⏟ n × p = [ ∂ μ 1 ∂ β 1 ⋯ ⋯ ∂ μ 1 ∂ β p ∂ μ 2 ∂ β 1 ⋯ ⋯ ∂ μ 2 ∂ β p ⋮ ⋮ ∂ μ m ∂ β 1 ⋯ ⋯ ∂ μ m ∂ β p ] V ⏟ n × n = diag ⁡ ( V ( μ 1 ) , V ( μ 2 ) , … , … , V ( μ n ) ) {\displaystyle \underbrace {D} _{n\times p}={\begin{bmatrix}{\frac {\partial \mu _{1}}{\partial \beta _{1}}}&\cdots &\cdots &{\frac {\partial \mu _{1}}{\partial \beta _{p}}}\\{\frac {\partial \mu _{2}}{\partial \beta _{1}}}&\cdots &\cdots &{\frac {\partial \mu _{2}}{\partial \beta _{p}}}\\\vdots \\\vdots \\{\frac {\partial \mu _{m}}{\partial \beta _{1}}}&\cdots &\cdots &{\frac {\partial \mu _{m}}{\partial \beta _{p}}}\end{bmatrix}}\underbrace {V} _{n\times n}=\operatorname {diag} (V(\mu _{1}),V(\mu _{2}),\ldots ,\ldots ,V(\mu _{n}))}

The quasi-information matrix in β {\displaystyle \beta } is,

- i b = − ∂ U ∂ β = Cov ⁡ ( U ( β ) ) = D T V − 1 D σ 2 {\displaystyle i_{b}=-{\frac {\partial U}{\partial \beta }}=\operatorname {Cov} (U(\beta ))={\frac {D^{T}V^{-1}D}{\sigma ^{2}}}}

Obtaining the score function and the information of β {\displaystyle \beta } allows for parameter estimation and inference in a similar manner as described in [Application – weighted least squares](#Section_name).

### Non-parametric regression analysis

A scattor plot of years in the major league against salary (x$1000). The line is the trend in the mean. The plot demonstrates that the variance is not constant.

The smoothed conditional variance against the smoothed conditional mean. The quadratic shape is indicative of the Gamma Distribution. The variance function of a Gamma is V(

        μ

    {\displaystyle \mu }

) =

          μ

            2

    {\displaystyle \mu ^{2}}

Non-parametric estimation of the variance function and its importance, has been discussed widely in the literature[5][6][7] In [non-parametric regression](/source/Non-parametric_regression) analysis, the goal is to express the expected value of your response variable(**y**) as a function of your predictors (**X**). That is we are looking to estimate a **mean** function, g ( x ) = E ⁡ [ y ∣ X = x ] {\displaystyle g(x)=\operatorname {E} [y\mid X=x]} without assuming a parametric form. There are many forms of non-parametric [smoothing](/source/Smoothing) methods to help estimate the function g ( x ) {\displaystyle g(x)} . An interesting approach is to also look at a non-parametric **variance function**, g v ( x ) = Var ⁡ ( Y ∣ X = x ) {\displaystyle g_{v}(x)=\operatorname {Var} (Y\mid X=x)} . A non-parametric variance function allows one to look at the mean function as it relates to the variance function and notice patterns in the data.

- g v ( x ) = Var ⁡ ( Y ∣ X = x ) = E ⁡ [ y 2 ∣ X = x ] − [ E ⁡ [ y ∣ X = x ] ] 2 {\displaystyle g_{v}(x)=\operatorname {Var} (Y\mid X=x)=\operatorname {E} [y^{2}\mid X=x]-\left[\operatorname {E} [y\mid X=x]\right]^{2}}

An example is detailed in the pictures to the right. The goal of the project was to determine (among other things) whether or not the predictor, **number of years in the major leagues** (baseball), had an effect on the response, **salary**, a player made. An initial scatter plot of the data indicates that there is heteroscedasticity in the data as the variance is not constant at each level of the predictor. Because we can visually detect the non-constant variance, it useful now to plot g v ( x ) = Var ⁡ ( Y ∣ X = x ) = E ⁡ [ y 2 ∣ X = x ] − [ E ⁡ [ y ∣ X = x ] ] 2 {\displaystyle g_{v}(x)=\operatorname {Var} (Y\mid X=x)=\operatorname {E} [y^{2}\mid X=x]-\left[\operatorname {E} [y\mid X=x]\right]^{2}} , and look to see if the shape is indicative of any known distribution. One can estimate E ⁡ [ y 2 ∣ X = x ] {\displaystyle \operatorname {E} [y^{2}\mid X=x]} and [ E ⁡ [ y ∣ X = x ] ] 2 {\displaystyle \left[\operatorname {E} [y\mid X=x]\right]^{2}} using a general [smoothing](/source/Smoothing) method. The plot of the non-parametric smoothed variance function can give the researcher an idea of the relationship between the variance and the mean. The picture to the right indicates a quadratic relationship between the mean and the variance. As we saw above, the Gamma variance function is quadratic in the mean.

## Notes

1. ^ [***a***](#cite_ref-Muller1_1-0) [***b***](#cite_ref-Muller1_1-1) Muller and Zhao (1995). ["On a semi parametric variance function model and a test for heteroscedasticity"](https://doi.org/10.1214%2Faos%2F1176324630). *The Annals of Statistics*. **23** (3): 946–967. [doi](/source/Doi_(identifier)):[10.1214/aos/1176324630](https://doi.org/10.1214%2Faos%2F1176324630). [JSTOR](/source/JSTOR_(identifier)) [2242430](https://www.jstor.org/stable/2242430).

1. **[^](#cite_ref-2)** Muller, Stadtmuller and Yao (2006). "Functional Variance Processes". *Journal of the American Statistical Association*. **101** (475): 1007–1018. [doi](/source/Doi_(identifier)):[10.1198/016214506000000186](https://doi.org/10.1198%2F016214506000000186). [JSTOR](/source/JSTOR_(identifier)) [27590778](https://www.jstor.org/stable/27590778). [S2CID](/source/S2CID_(identifier)) [13712496](https://api.semanticscholar.org/CorpusID:13712496).

1. **[^](#cite_ref-3)** Wedderburn, R.W.M. (1974). "Quasi-likelihood functions, generalized linear models, and the Gauss–Newton Method". *Biometrika*. **61** (3): 439–447. [doi](/source/Doi_(identifier)):[10.1093/biomet/61.3.439](https://doi.org/10.1093%2Fbiomet%2F61.3.439). [JSTOR](/source/JSTOR_(identifier)) [2334725](https://www.jstor.org/stable/2334725).

1. **[^](#cite_ref-4)** McCullagh, Peter; [Nelder, John](/source/John_Nelder) (1989). *Generalized Linear Models* (second ed.). London: Chapman and Hall. [ISBN](/source/ISBN_(identifier)) [0-412-31760-5](https://en.wikipedia.org/wiki/Special:BookSources/0-412-31760-5).{{[cite book](https://en.wikipedia.org/wiki/Template:Cite_book)}}: CS1 maint: publisher location ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_publisher_location))

1. **[^](#cite_ref-5)** Muller and StadtMuller (1987). ["Estimation of Heteroscedasticity in Regression Analysis"](https://doi.org/10.1214%2Faos%2F1176350364). *The Annals of Statistics*. **15** (2): 610–625. [doi](/source/Doi_(identifier)):[10.1214/aos/1176350364](https://doi.org/10.1214%2Faos%2F1176350364). [JSTOR](/source/JSTOR_(identifier)) [2241329](https://www.jstor.org/stable/2241329).

1. **[^](#cite_ref-6)** Cai and Wang, T.; Wang, Lie (2008). "Adaptive Variance Function Estimation in Heteroscedastic Nonparametric Regression". *The Annals of Statistics*. **36** (5): 2025–2054. [arXiv](/source/ArXiv_(identifier)):[0810.4780](https://arxiv.org/abs/0810.4780). [Bibcode](/source/Bibcode_(identifier)):[2008arXiv0810.4780C](https://ui.adsabs.harvard.edu/abs/2008arXiv0810.4780C). [doi](/source/Doi_(identifier)):[10.1214/07-AOS509](https://doi.org/10.1214%2F07-AOS509). [JSTOR](/source/JSTOR_(identifier)) [2546470](https://www.jstor.org/stable/2546470). [S2CID](/source/S2CID_(identifier)) [9184727](https://api.semanticscholar.org/CorpusID:9184727).

1. **[^](#cite_ref-7)** Rice and Silverman (1991). "Estimating the Mean and Covariance structure nonparametrically when the data are curves". *Journal of the Royal Statistical Society*. **53** (1): 233–243. [JSTOR](/source/JSTOR_(identifier)) [2345738](https://www.jstor.org/stable/2345738).

## References

- [McCullagh, Peter](/source/Peter_McCullagh); [Nelder, John](/source/John_Nelder) (1989). *Generalized Linear Models* (second ed.). London: Chapman and Hall. [ISBN](/source/ISBN_(identifier)) [0-412-31760-5](https://en.wikipedia.org/wiki/Special:BookSources/0-412-31760-5).{{[cite book](https://en.wikipedia.org/wiki/Template:Cite_book)}}: CS1 maint: publisher location ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_publisher_location))

- Henrik Madsen and Poul Thyregod (2011). *Introduction to General and Generalized Linear Models*. Chapman & Hall/CRC. [ISBN](/source/ISBN_(identifier)) [978-1-4200-9155-7](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4200-9155-7).

## External links

- Media related to [Variance function](https://commons.wikimedia.org/wiki/Category:Variance_function) at Wikimedia Commons

v t e Statistics Outline Index Descriptive statistics Continuous data Center Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode Dispersion Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance Shape Central limit theorem Moments Kurtosis L-moments Skewness Count data Index of dispersion Summary tables Contingency table Frequency distribution Grouped data Dependence Partial correlation Pearson product-moment correlation Rank correlation Kendall's τ Spearman's ρ Scatter plot Graphics Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Radar chart Run chart Scatter plot Stem-and-leaf display Violin plot Heatmap Scatter Plot Matrix ECDF plot Line chart Statistical data processing Transformations Data transformation Log transformation Power transform Box–Cox transformation Yeo–Johnson transformation Variance-stabilizing transformation Anscombe transform Fisher transformation Scaling and normalization Feature scaling Normalization Standardization (z-score) Min–max normalization Unit vector normalization Data cleaning Data cleaning Outlier Winsorizing Truncation Missing data Data reduction Dimensionality reduction Principal component analysis Factor analysis Time-series preprocessing Differencing Detrending Seasonal adjustment Stationarity transformation Data collection Study design Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power Survey methodology Sampling Cluster Stratified Opinion poll Questionnaire Standard error Controlled experiments Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control Adaptive designs Adaptive clinical trial Stochastic approximation Up-and-down designs Observational studies Cohort study Cross-sectional study Natural experiment Quasi-experiment Statistical inference Statistical theory Population Statistic Probability distribution Sampling distribution Order statistic Empirical distribution Density estimation Statistical model Model specification Lp space Parameter location scale shape Parametric family Likelihood (monotone) Location–scale family Exponential family Completeness Sufficiency Statistical functional Bootstrap U V Optimal decision loss function Efficiency Statistical distance divergence Asymptotics Robustness Frequentist inference Point estimation Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in Interval estimation Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife Testing hypotheses 1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons Parametric tests Likelihood-ratio Score/Lagrange multiplier Wald Specific tests Z-test (normal) Student's t-test F-test Goodness of fit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC Rank statistics Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test Bayesian inference Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator Correlation Regression analysis Correlation Pearson product-moment Partial correlation Confounding variable Coefficient of determination Regression analysis Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS) Template:Least squares and regression analysis Linear regression Simple linear regression Ordinary least squares General linear model Bayesian regression Non-standard predictors Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity Generalized linear model Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions Partition of variance Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom Categorical / multivariate / time-series / survival analysis Categorical Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test Cochran–Mantel–Haenszel statistics Multivariate Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal Time-series General Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality Specific tests Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey Time domain Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR)) Frequency domain Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood Survival Survival function Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time Hazard function Nelson–Aalen estimator Test Log-rank test Applications Biostatistics Bioinformatics Clinical trials / studies Epidemiology Medical statistics Engineering statistics Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification Social statistics Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics Spatial statistics Cartography Environmental statistics Geographic information system Geostatistics Kriging Category Mathematics portal Commons WikiProject

---
Adapted from the Wikipedia article [Variance function](https://en.wikipedia.org/wiki/Variance_function) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Variance_function?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
