# Loss function

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Loss_function
> Markdown URL: https://mediated.wiki/source/Loss_function.md
> Source: https://en.wikipedia.org/wiki/Loss_function
> Source revision: 1353824402
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Mathematical relation assigning a probability event to a cost

In [mathematical optimization](/source/Mathematical_optimization) and [decision theory](/source/Decision_theory), a **loss function** or **cost function** (sometimes also called an error function)[1] is a function that maps an [event](/source/Event_(probability_theory)) or values of one or more variables onto a [real number](/source/Real_number) intuitively representing some "cost" associated with the event. An [optimization problem](/source/Optimization_problem) seeks to minimize a loss function. An **objective function** is either a loss function or its opposite (in specific domains, variously called a [reward function](/source/Reward_function), a [profit function](/source/Profit_function), a [utility function](/source/Utility_function), a [fitness function](/source/Fitness_function), etc.), in which case it is to be maximized. The loss function could include terms from several levels of the hierarchy[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify)*].

In statistics, typically a loss function is used for [parameter estimation](/source/Parameter_estimation), and the event in question is some function of the difference between estimated and true values for an instance of data. The concept, as old as [Laplace](/source/Pierre-Simon_Laplace), was reintroduced in statistics by [Abraham Wald](/source/Abraham_Wald) in the middle of the 20th century.[2] In the context of [economics](/source/Economics), for example, this is usually [economic cost](/source/Economic_cost) or [regret](/source/Regret_(decision_theory)). In [classification](/source/Statistical_classification), it is the penalty for an incorrect classification of an example. In [actuarial science](/source/Actuarial_science), it is used in an insurance context to model benefits paid over premiums, particularly since the works of [Harald Cramér](/source/Harald_Cram%C3%A9r) in the 1920s.[3] In [optimal control](/source/Optimal_control), the loss is the penalty for failing to achieve a desired value. In [financial risk management](/source/Financial_risk_management), the function is mapped to a monetary loss.

Comparison of common loss functions ([MAE](/source/Mean_absolute_error), SMAE, [Huber loss](/source/Huber_loss), and log-cosh loss)  used for regression

## Examples

### Regret

Main article: [Regret (decision theory)](/source/Regret_(decision_theory))

[Leonard J. Savage](/source/Leonard_J._Savage) argued that using non-Bayesian methods such as [minimax](/source/Minimax), the loss function should be based on the idea of *[regret](/source/Regret_(decision_theory))*, i.e., the loss associated with a decision should be the difference between the consequences of the best decision that could have been made under circumstances will be known and the decision that was in fact taken before they were known.

### Quadratic loss function

The use of a [quadratic](/source/Quadratic_function) loss function is common, for example when using [least squares](/source/Least_squares) techniques. It is often more mathematically tractable than other loss functions because of the properties of [variances](/source/Variance), as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target. If the target is t {\displaystyle t} , then a quadratic loss function is

- λ ( x ) = C ( t − x ) 2 {\displaystyle \lambda (x)=C(t-x)^{2}\;}

for some constant C {\displaystyle C} ; the value of the constant makes no difference to a decision, and can be ignored by setting it equal to 1. This is also known as the **squared error loss** (**SEL**).[1]

Many common [statistics](/source/Statistic), including [t-tests](/source/T-test), [regression](/source/Regression_analysis) models, [design of experiments](/source/Design_of_experiments), and much else, use [least squares](/source/Least_squares) methods applied using [linear regression](/source/Linear_regression) theory, which is based on the quadratic loss function.

The quadratic loss function is also used in [linear-quadratic optimal control problems](/source/Linear-quadratic_regulator). In these problems, even in the absence of uncertainty, it may not be possible to achieve the desired values of all target variables. Often loss is expressed as a [quadratic form](/source/Quadratic_form) in the deviations of the variables of interest from their desired values; this approach is [tractable](/source/Closed-form_expression) because it results in linear [first-order conditions](/source/First-order_condition). In the context of [stochastic control](/source/Stochastic_control), the expected value of the quadratic form is used. The quadratic loss assigns more importance to outliers than to the true data due to its square nature, so alternatives like the [Huber](/source/Huber_loss), log-cosh and SMAE[*[further explanation needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify)*] losses are used when the data has many large outliers.

Effect of using different loss functions, when the data has outliers

### 0-1 loss function

In [statistics](/source/Statistics) and [decision theory](/source/Decision_theory), a frequently used loss function is the *0-1 loss function*

- L ( y ^ , y ) = { 0 if y = y ^ 1 if y ≠ y ^ {\displaystyle L({\hat {y}},y)={\begin{cases}0&{\text{if }}y={\hat {y}}\\1&{\text{if }}y\neq {\hat {y}}\end{cases}}}

In [information theory](/source/Information_theory), this loss function is known as [Hamming distortion](/source/Rate%E2%80%93distortion_theory#Hamming_distortion).

## Constructing loss and objective functions

See also: [Scoring rule](/source/Scoring_rule)

In many applications, objective functions, including loss functions as a particular case, are determined by the problem formulation. In other situations, the decision maker’s preference must be elicited and represented by a scalar-valued function (called also [utility](/source/Utility) function) in a form suitable for optimization — the problem that [Ragnar Frisch](/source/Ragnar_Frisch) has highlighted in his [Nobel Prize](/source/Nobel_Prize) lecture.[4] The existing methods for constructing objective functions are collected in the proceedings of two dedicated conferences.[5][6] In particular, [Andranik Tangian](/source/Andranik_Tangian) showed that the most usable objective functions — quadratic and additive — are determined by a few [indifference](/source/Principle_of_indifference) points. He used this property in the models for constructing these objective functions from either [ordinal](/source/Ordinal_utility) or [cardinal](/source/Cardinal_utility) data that were elicited through computer-assisted interviews with decision makers.[7][8] Among other things, he constructed objective functions to optimally distribute budgets for 16 Westfalian universities[9] and the European subsidies for equalizing unemployment rates among 271 German regions.[10]

## Expected loss

See also: [Empirical risk minimization](/source/Empirical_risk_minimization)

In some contexts, the value of the loss function itself is a random quantity because it depends on the outcome of a random variable X {\displaystyle X} .

### Statistics

Both [frequentist](/source/Frequentist_probability) and [Bayesian](/source/Bayesian_probability) statistical theory involve making a decision based on the [expected value](/source/Expected_value) of the loss function; however, this quantity is defined differently under the two paradigms.

#### Frequentist expected loss

We first define the expected loss in the frequentist context. It is obtained by taking the expected value with respect to the [probability distribution](/source/Probability_distribution), P θ {\displaystyle P_{\theta }} , of the observed data, X {\displaystyle X} . This is also referred to as the **risk function**[11][12][13][14] of the decision rule δ {\displaystyle \delta } and the parameter θ {\displaystyle \theta } . Here the decision rule depends on the outcome of X {\displaystyle X} . The risk function is given by:

- R ( θ , δ ) = E θ ⁡ L ( θ , δ ( X ) ) = ∫ X L ( θ , δ ( x ) ) d P θ ( x ) . {\displaystyle R(\theta ,\delta )=\operatorname {E} _{\theta }L{\big (}\theta ,\delta (X){\big )}=\int _{X}L{\big (}\theta ,\delta (x){\big )}\,\mathrm {d} P_{\theta }(x).}

Here, θ {\displaystyle \theta } is a fixed but possibly unknown state of nature, X {\displaystyle X} is a vector of observations stochastically drawn from a [population](/source/Statistical_population), E θ {\displaystyle \operatorname {E} _{\theta }} is the expectation over all population values of X {\displaystyle X} , d P θ {\displaystyle \mathrm {d} P_{\theta }} is a [probability measure](/source/Probability_measure) over the event space of X {\displaystyle X} (parametrized by θ {\displaystyle \theta } ) and the integral is evaluated over the entire [support](/source/Support_(measure_theory)) of X {\displaystyle X} .

#### Bayes Risk

In a Bayesian approach, the expectation is calculated using the [prior distribution](/source/Prior_distribution) π ∗ {\displaystyle \pi ^{*}} of the parameter θ {\displaystyle \theta } :

- ρ ( π ∗ , a ) = ∫ Θ ∫ X L ( θ , a ( x ) ) d P ( x | θ ) d π ∗ ( θ ) = ∫ X ∫ Θ L ( θ , a ( x ) ) d π ∗ ( θ | x ) d M ( x ) {\displaystyle \rho (\pi ^{*},a)=\int _{\Theta }\int _{\mathbf {X}}L(\theta ,a({\mathbf {x}}))\,\mathrm {d} P({\mathbf {x}}\vert \theta )\,\mathrm {d} \pi ^{*}(\theta )=\int _{\mathbf {X}}\int _{\Theta }L(\theta ,a({\mathbf {x}}))\,\mathrm {d} \pi ^{*}(\theta \vert {\mathbf {x}})\,\mathrm {d} M({\mathbf {x}})}

where M ( x ) {\displaystyle M(\mathbf {x} )} is known as the *predictive likelihood* wherein θ {\displaystyle \theta } has been "integrated out," π ∗ ( θ | x ) {\displaystyle \pi ^{*}(\theta |\mathbf {x} )} is the posterior distribution, and the order of integration has been changed. One then should choose the action a ∗ {\displaystyle a^{*}} which minimises this expected loss, which is referred to as *Bayes Risk*. In the latter equation, the integrand inside d x {\displaystyle \mathrm {d} x} is known as the *Posterior Risk*, and minimising it with respect to decision a {\displaystyle a} also minimizes the overall Bayes Risk. This optimal decision, a ∗ {\displaystyle a^{*}} is known as the *Bayes (decision) Rule* - it minimises the average loss over all possible states of nature θ {\displaystyle \theta } , over all possible (probability-weighted) data outcomes. One advantage of the Bayesian approach is to that one need only choose the optimal action under the actual observed data to obtain a uniformly optimal one, whereas choosing the actual frequentist optimal decision rule as a function of all possible observations, is a much more difficult problem. Of equal importance though, the Bayes Rule reflects consideration of loss outcomes under different states of nature, θ {\displaystyle \theta } .

#### Examples in statistics

- For a scalar parameter θ {\displaystyle \theta } , a decision function whose output θ ^ {\displaystyle {\hat {\theta }}} is an estimate of θ {\displaystyle \theta } , and a quadratic loss function ([squared error loss](/source/Squared_error_loss)) L ( θ , θ ^ ) = ( θ − θ ^ ) 2 , {\displaystyle L(\theta ,{\hat {\theta }})=(\theta -{\hat {\theta }})^{2},} the risk function becomes the [mean squared error](/source/Mean_squared_error) of the estimate, R ( θ , θ ^ ) = E θ ⁡ [ ( θ − θ ^ ) 2 ] . {\displaystyle R(\theta ,{\hat {\theta }})=\operatorname {E} _{\theta }\left[(\theta -{\hat {\theta }})^{2}\right].} An [estimator](/source/Estimator) found by minimizing the [mean squared error](/source/Mean_squared_error) estimates the [posterior distribution](/source/Posterior_distribution)'s mean.

- In [density estimation](/source/Density_estimation), the unknown parameter is [probability density](/source/Probability_density_function) itself. The loss function is typically chosen to be a [norm](/source/Norm_(mathematics)) in an appropriate [function space](/source/Function_space). For example, for [L 2 {\displaystyle L^{2}} norm](/source/L2_norm), L ( f , f ^ ) = ‖ f − f ^ ‖ 2 2 , {\displaystyle L(f,{\hat {f}})=\|f-{\hat {f}}\|_{2}^{2}\,,} the risk function becomes the [mean integrated squared error](/source/Mean_integrated_squared_error) R ( f , f ^ ) = E ⁡ ( ‖ f − f ^ ‖ 2 ) . {\displaystyle R(f,{\hat {f}})=\operatorname {E} \left(\|f-{\hat {f}}\|^{2}\right).\,}

### Economic choice under uncertainty

In economics, decision-making under uncertainty is often modelled using the [von Neumann–Morgenstern utility function](/source/Von_Neumann%E2%80%93Morgenstern_utility_function) of the uncertain variable of interest, such as end-of-period wealth. Since the value of this variable is uncertain, so is the value of the utility function; it is the expected value of utility that is maximized.

## Decision rules

A [decision rule](/source/Decision_rule) makes a choice using an optimality criterion. Some commonly used criteria are:

- **[Minimax](/source/Minimax)**: Choose the decision rule with the lowest worst loss — that is, minimize the worst-case (maximum possible) loss: a r g m i n δ max θ ∈ Θ R ( θ , δ ) . {\displaystyle {\underset {\delta }{\operatorname {arg\,min} }}\ \max _{\theta \in \Theta }\ R(\theta ,\delta ).}

- **[Invariance](/source/Invariant_estimator)**: Choose the decision rule which satisfies an invariance requirement.

- Choose the decision rule with the lowest average loss (i.e., minimize the [expected value](/source/Expected_value) of the loss function): a r g m i n δ E θ ∈ Θ ⁡ [ R ( θ , δ ) ] = a r g m i n δ ∫ θ ∈ Θ R ( θ , δ ) p ( θ ) d θ . {\displaystyle {\underset {\delta }{\operatorname {arg\,min} }}\operatorname {E} _{\theta \in \Theta }[R(\theta ,\delta )]={\underset {\delta }{\operatorname {arg\,min} }}\ \int _{\theta \in \Theta }R(\theta ,\delta )\,p(\theta )\,d\theta .}

## Selecting a loss function

Sound statistical practice requires selecting an estimator consistent with the actual acceptable variation experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances.[15]

A common example involves estimating "[location](/source/Location_parameter)". Under typical statistical assumptions, the [mean](/source/Mean) or average is the statistic for estimating location that minimizes the expected loss experienced under the [squared-error](/source/Least_squares) loss function, while the [median](/source/Median) is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances.

In economics, when an agent is [risk neutral](/source/Risk_neutral), the objective function is simply expressed as the expected value of a monetary quantity, such as profit, income, or end-of-period wealth. For [risk-averse](/source/Risk_aversion) or [risk-loving](/source/Risk-loving) agents, loss is measured as the negative of a [utility function](/source/Utility), and the objective function to be optimized is the expected value of utility.

Other measures of cost are possible, for example [mortality](/source/Mortality_rate) or [morbidity](/source/Morbidity) in the field of [public health](/source/Public_health) or [safety engineering](/source/Safety_engineering).

For most [optimization algorithms](/source/Optimization_algorithm), it is desirable to have a loss function that is globally [continuous](/source/Continuous_function) and [differentiable](/source/Differentiable_function).

Two very commonly used loss functions are the [squared loss](/source/Mean_squared_error), L ( a ) = a 2 {\displaystyle L(a)=a^{2}} , and the [absolute loss](/source/Absolute_deviation), L ( a ) = | a | {\displaystyle L(a)=|a|} . However the absolute loss has the disadvantage that it is not differentiable at a = 0 {\displaystyle a=0} . The squared loss has the disadvantage that it has the tendency to be dominated by [outliers](/source/Outlier)—when summing over a set of a {\displaystyle a} 's (as in ∑ i = 1 n L ( a i ) {\textstyle \sum _{i=1}^{n}L(a_{i})} ), the final sum tends to be the result of a few particularly large a {\displaystyle a} -values, rather than an expression of the average a {\displaystyle a} -value.

The choice of a loss function is not arbitrary. It is very restrictive and sometimes the loss function may be characterized by its desirable properties.[16] Among the choice principles are, for example, the requirement of completeness of the class of symmetric statistics in the case of [i.i.d.](/source/I.i.d.) observations, the principle of complete information, and some others.

[W. Edwards Deming](/source/W._Edwards_Deming) and [Nassim Nicholas Taleb](/source/Nassim_Nicholas_Taleb) argue that empirical reality, not nice mathematical properties, should be the sole basis for selecting loss functions, and real losses often are not mathematically nice and are not differentiable, continuous, symmetric, etc. For example, a person who arrives before a plane gate closure can still make the plane, but a person who arrives after cannot, a discontinuity and asymmetry which makes arriving slightly late much more costly than arriving slightly early. In drug dosing, the cost of too little drug may be lack of efficacy, while the cost of too much may be tolerable toxicity, another example of asymmetry. Traffic, pipes, beams, ecologies, climates, etc. may tolerate increased load or stress with little noticeable change up to a point, then become backed up or break catastrophically. These situations, Deming and Taleb argue, are common in real-life problems, perhaps more common than classical smooth, continuous, symmetric, differentials cases.[17]

## See also

- [Bayesian regret](/source/Bayesian_regret)

- [Loss functions for classification](/source/Loss_functions_for_classification)

- [Discounted maximum loss](/source/Discounted_maximum_loss)

- [Hinge loss](/source/Hinge_loss)

- [Scoring rule](/source/Scoring_rule)

- [Statistical risk](/source/Statistical_risk)

## References

1. ^ [***a***](#cite_ref-ttf2001_1-0) [***b***](#cite_ref-ttf2001_1-1) Hastie, Trevor; [Tibshirani, Robert](/source/Robert_Tibshirani); [Friedman, Jerome H.](/source/Jerome_H._Friedman) (2001). [*The Elements of Statistical Learning*](https://web.stanford.edu/~hastie/ElemStatLearn/). Springer. p. 18. [ISBN](/source/ISBN_(identifier)) [0-387-95284-5](https://en.wikipedia.org/wiki/Special:BookSources/0-387-95284-5).

1. **[^](#cite_ref-2)** Wald, A. (1950). [*Statistical Decision Functions*](https://psycnet.apa.org/record/1951-01400-000). Wiley – via APA Psycnet.

1. **[^](#cite_ref-3)** Cramér, H. (1930). *On the mathematical theory of risk*. Centraltryckeriet.

1. **[^](#cite_ref-4)** Frisch, Ragnar (1969). "From utopian theory to practical applications: the case of econometrics". [*The Nobel Prize–Prize Lecture*](https://www.nobelprize.org/prizes/economic-sciences/1969/frisch/lecture/). Retrieved 15 February 2021.

1. **[^](#cite_ref-TangianGruber1997_5-0)** Tangian, Andranik; Gruber, Josef (1997). *Constructing Scalar-Valued Objective Functions. Proceedings of the Third International Conference on Econometric Decision Models: Constructing Scalar-Valued Objective Functions, University of Hagen, held in Katholische Akademie Schwerte September 5–8, 1995*. Lecture Notes in Economics and Mathematical Systems. Vol. 453. Berlin: Springer. [doi](/source/Doi_(identifier)):[10.1007/978-3-642-48773-6](https://doi.org/10.1007%2F978-3-642-48773-6). [ISBN](/source/ISBN_(identifier)) [978-3-540-63061-6](https://en.wikipedia.org/wiki/Special:BookSources/978-3-540-63061-6).

1. **[^](#cite_ref-TangianGruber2002_6-0)** Tangian, Andranik; Gruber, Josef (2002). *Constructing and Applying Objective Functions. Proceedings of the Fourth International Conference on Econometric Decision Models Constructing and Applying Objective Functions, University of Hagen, held in Haus Nordhelle, August, 28 — 31, 2000*. Lecture Notes in Economics and Mathematical Systems. Vol. 510. Berlin: Springer. [doi](/source/Doi_(identifier)):[10.1007/978-3-642-56038-5](https://doi.org/10.1007%2F978-3-642-56038-5). [ISBN](/source/ISBN_(identifier)) [978-3-540-42669-1](https://en.wikipedia.org/wiki/Special:BookSources/978-3-540-42669-1).

1. **[^](#cite_ref-Tangian2002_7-0)** Tangian, Andranik (2002). "Constructing a quasi-concave quadratic objective function from interviewing a decision maker". *European Journal of Operational Research*. **141** (3): 608–640. [doi](/source/Doi_(identifier)):[10.1016/S0377-2217(01)00185-0](https://doi.org/10.1016%2FS0377-2217%2801%2900185-0). [S2CID](/source/S2CID_(identifier)) [39623350](https://api.semanticscholar.org/CorpusID:39623350).

1. **[^](#cite_ref-Tangian2004additiveUtility_8-0)** Tangian, Andranik (2004). "A model for ordinally constructing additive objective functions". *European Journal of Operational Research*. **159** (2): 476–512. [doi](/source/Doi_(identifier)):[10.1016/S0377-2217(03)00413-2](https://doi.org/10.1016%2FS0377-2217%2803%2900413-2). [S2CID](/source/S2CID_(identifier)) [31019036](https://api.semanticscholar.org/CorpusID:31019036).

1. **[^](#cite_ref-Tangian2004universityBudgets_9-0)** Tangian, Andranik (2004). "Redistribution of university budgets with respect to the status quo". *European Journal of Operational Research*. **157** (2): 409–428. [doi](/source/Doi_(identifier)):[10.1016/S0377-2217(03)00271-6](https://doi.org/10.1016%2FS0377-2217%2803%2900271-6).

1. **[^](#cite_ref-Tangian2008RegionalEnemployment_10-0)** Tangian, Andranik (2008). ["Multi-criteria optimization of regional employment policy: A simulation analysis for Germany"](https://onlinelibrary.wiley.com/doi/10.1111/j.1467-940X.2008.00144.x). *Review of Urban and Regional Development*. **20** (2): 103–122. [doi](/source/Doi_(identifier)):[10.1111/j.1467-940X.2008.00144.x](https://doi.org/10.1111%2Fj.1467-940X.2008.00144.x).

1. **[^](#cite_ref-11)** Nikulin, M.S. (2001) [1994], ["Risk of a statistical procedure"](https://www.encyclopediaofmath.org/index.php?title=Risk_of_a_statistical_procedure), *[Encyclopedia of Mathematics](/source/Encyclopedia_of_Mathematics)*, [EMS Press](/source/European_Mathematical_Society)

1. **[^](#cite_ref-12)** [Berger, James O.](/source/James_Berger_(statistician)) (1985). [*Statistical decision theory and Bayesian Analysis*](https://books.google.com/books?id=oY_x7dE15_AC) (2nd ed.). New York: Springer-Verlag. [Bibcode](/source/Bibcode_(identifier)):[1985sdtb.book.....B](https://ui.adsabs.harvard.edu/abs/1985sdtb.book.....B). [ISBN](/source/ISBN_(identifier)) [978-0-387-96098-2](https://en.wikipedia.org/wiki/Special:BookSources/978-0-387-96098-2). [MR](/source/MR_(identifier)) [0804611](https://mathscinet.ams.org/mathscinet-getitem?mr=0804611).

1. **[^](#cite_ref-13)** [DeGroot, Morris](/source/Morris_H._DeGroot) (2004) [1970]. *Optimal Statistical Decisions*. Wiley Classics Library. [ISBN](/source/ISBN_(identifier)) [978-0-471-68029-1](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-68029-1). [MR](/source/MR_(identifier)) [2288194](https://mathscinet.ams.org/mathscinet-getitem?mr=2288194).

1. **[^](#cite_ref-14)** Robert, Christian P. (2007). *The Bayesian Choice*. Springer Texts in Statistics (2nd ed.). New York: Springer. [doi](/source/Doi_(identifier)):[10.1007/0-387-71599-1](https://doi.org/10.1007%2F0-387-71599-1). [ISBN](/source/ISBN_(identifier)) [978-0-387-95231-4](https://en.wikipedia.org/wiki/Special:BookSources/978-0-387-95231-4). [MR](/source/MR_(identifier)) [1835885](https://mathscinet.ams.org/mathscinet-getitem?mr=1835885).

1. **[^](#cite_ref-15)** Pfanzagl, J. (1994). *Parametric Statistical Theory*. Berlin: Walter de Gruyter. [ISBN](/source/ISBN_(identifier)) [978-3-11-013863-4](https://en.wikipedia.org/wiki/Special:BookSources/978-3-11-013863-4).

1. **[^](#cite_ref-16)** Detailed information on mathematical principles of the loss function choice is given in Chapter 2 of the book Klebanov, B.; Rachev, Svetlozat T.; Fabozzi, Frank J. (2009). *Robust and Non-Robust Models in Statistics*. New York: Nova Scientific Publishers, Inc. (and references there).

1. **[^](#cite_ref-17)** Deming, W. Edwards (2000). *Out of the Crisis*. The MIT Press. [ISBN](/source/ISBN_(identifier)) [9780262541152](https://en.wikipedia.org/wiki/Special:BookSources/9780262541152).

## Further reading

- Aretz, Kevin; Bartram, Söhnke M.; Pope, Peter F. (April–June 2011). ["Asymmetric Loss Functions and the Rationality of Expected Stock Returns"](https://mpra.ub.uni-muenchen.de/47343/1/MPRA_paper_47343.pdf) (PDF). *International Journal of Forecasting*. **27** (2): 413–437. [doi](/source/Doi_(identifier)):[10.1016/j.ijforecast.2009.10.008](https://doi.org/10.1016%2Fj.ijforecast.2009.10.008). [SSRN](/source/SSRN_(identifier)) [889323](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=889323).

- [Berger, James O.](/source/James_Berger_(statistician)) (1985). *Statistical decision theory and Bayesian Analysis* (2nd ed.). New York: Springer-Verlag. [Bibcode](/source/Bibcode_(identifier)):[1985sdtb.book.....B](https://ui.adsabs.harvard.edu/abs/1985sdtb.book.....B). [ISBN](/source/ISBN_(identifier)) [978-0-387-96098-2](https://en.wikipedia.org/wiki/Special:BookSources/978-0-387-96098-2). [MR](/source/MR_(identifier)) [0804611](https://mathscinet.ams.org/mathscinet-getitem?mr=0804611).

- Cecchetti, S. (2000). ["Making monetary policy: Objectives and rules"](https://www.researchgate.net/publication/5216117). *Oxford Review of Economic Policy*. **16** (4): 43–59. [doi](/source/Doi_(identifier)):[10.1093/oxrep/16.4.43](https://doi.org/10.1093%2Foxrep%2F16.4.43).

- Horowitz, Ann R. (1987). "Loss functions and public policy". *Journal of Macroeconomics*. **9** (4): 489–504. [doi](/source/Doi_(identifier)):[10.1016/0164-0704(87)90016-4](https://doi.org/10.1016%2F0164-0704%2887%2990016-4).

- Waud, Roger N. (1976). "Asymmetric Policymaker Utility Functions and Optimal Policy under Uncertainty". *Econometrica*. **44** (1): 53–66. [doi](/source/Doi_(identifier)):[10.2307/1911380](https://doi.org/10.2307%2F1911380). [JSTOR](/source/JSTOR_(identifier)) [1911380](https://www.jstor.org/stable/1911380).

v t e Statistics Outline Index Descriptive statistics Continuous data Center Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode Dispersion Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance Shape Central limit theorem Moments Kurtosis L-moments Skewness Count data Index of dispersion Summary tables Contingency table Frequency distribution Grouped data Dependence Partial correlation Pearson product-moment correlation Rank correlation Kendall's τ Spearman's ρ Scatter plot Graphics Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Radar chart Run chart Scatter plot Stem-and-leaf display Violin plot Heatmap Scatter Plot Matrix ECDF plot Line chart Statistical data processing Transformations Data transformation Log transformation Power transform Box–Cox transformation Yeo–Johnson transformation Variance-stabilizing transformation Anscombe transform Fisher transformation Scaling and normalization Feature scaling Normalization Standardization (z-score) Min–max normalization Unit vector normalization Data cleaning Data cleaning Outlier Winsorizing Truncation Missing data Data reduction Dimensionality reduction Principal component analysis Factor analysis Time-series preprocessing Differencing Detrending Seasonal adjustment Stationarity transformation Data collection Study design Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power Survey methodology Sampling Cluster Stratified Opinion poll Questionnaire Standard error Controlled experiments Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control Adaptive designs Adaptive clinical trial Stochastic approximation Up-and-down designs Observational studies Cohort study Cross-sectional study Natural experiment Quasi-experiment Statistical inference Statistical theory Population Statistic Probability distribution Sampling distribution Order statistic Empirical distribution Density estimation Statistical model Model specification Lp space Parameter location scale shape Parametric family Likelihood (monotone) Location–scale family Exponential family Completeness Sufficiency Statistical functional Bootstrap U V Optimal decision loss function Efficiency Statistical distance divergence Asymptotics Robustness Frequentist inference Point estimation Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in Interval estimation Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife Testing hypotheses 1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons Parametric tests Likelihood-ratio Score/Lagrange multiplier Wald Specific tests Z-test (normal) Student's t-test F-test Goodness of fit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC Rank statistics Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test Bayesian inference Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator Correlation Regression analysis Correlation Pearson product-moment Partial correlation Confounding variable Coefficient of determination Regression analysis Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS) Template:Least squares and regression analysis Linear regression Simple linear regression Ordinary least squares General linear model Bayesian regression Non-standard predictors Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity Generalized linear model Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions Partition of variance Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom Categorical / multivariate / time-series / survival analysis Categorical Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test Cochran–Mantel–Haenszel statistics Multivariate Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal Time-series General Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality Specific tests Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey Time domain Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR)) Frequency domain Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood Survival Survival function Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time Hazard function Nelson–Aalen estimator Test Log-rank test Applications Biostatistics Bioinformatics Clinical trials / studies Epidemiology Medical statistics Engineering statistics Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification Social statistics Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics Spatial statistics Cartography Environmental statistics Geographic information system Geostatistics Kriging Category Mathematics portal Commons WikiProject

v t e Differentiable computing General Differentiable programming Information geometry Statistical manifold Automatic differentiation Neuromorphic computing Pattern recognition Ricci calculus Computational learning theory Inductive bias Hardware IPU TPU VPU Memristor SpiNNaker Software libraries TensorFlow PyTorch Keras scikit-learn Theano JAX Flux.jl MindSpore Portals Computer programming Technology

---
Adapted from the Wikipedia article [Loss function](https://en.wikipedia.org/wiki/Loss_function) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Loss_function?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
