Symmetric mean absolute percentage error

{{Short description|Statistical accuracy measure}} The '''symmetric mean absolute percentage error''' ('''SMAPE''' or '''sMAPE''') is an accuracy measure based on percentage (or relative) errors. It is usually defined{{Citation needed|reason=S. Makridakis didn't use the following definition in his article ''Accuracy measures: theoretical and practical concerns,'' 1993.|date=May 2017}} as follows:

: <math> \text{SMAPE} = \frac{2}{n} \sum_{t=1}^n \frac{\left|F_t-A_t\right|}{|A_t|+|F_t|}</math>

where <math>A_t</math> are the actual values and <math>F_t</math> are the forecasted values. Note that if <math>A_t = F_t = 0</math>, then term <math>t</math> is undefined (<math>0/0</math>), and is usually ignored in the summation.

Explaining this equation in words, the absolute difference between ''A''''t'' and ''F''''t'' is divided by half the sum of absolute values of the actual value ''A''''t'' and the forecast value ''F''''t''. The value of this calculation is summed for every fitted point ''t'' and divided again by the number of fitted points ''n''.

== History ==

The earliest reference to a similar formula appears to be Armstrong (1985, p. 348), where it is called "adjusted MAPE" and is defined without the absolute values in the denominator. It was later discussed, modified, and re-proposed by Flores (1986).

Armstrong's original definition is as follows:

: <math> \text{SMAPE} = \frac 1 n \sum_{t=1}^n \frac{\left|F_t-A_t\right|}{(A_t+F_t)/2}</math>

The problem is that it can be negative if <math>A_t + F_t < 0</math>. Therefore, the currently accepted version of SMAPE assumes the absolute values in the denominator.

== Discussion ==

=== Comparison with MAPE ===

The idea behind '''SMAPE''' is that over and under-forecasts are treated in a relative way, rather than an absolute way, as with the mean absolute percentage error ('''MAPE'''). For example, applying the formula above to some actual <math>A</math> and forecasted <math>F</math> values:

{| class="wikitable" |- ! <math>A</math> !! <math>F</math> !! MAPE !! SMAPE |- | 100 || 110 || 10% || 9.52% |- | 100 || 90 || 10% || 10.53% |}

we see that MAPE considers an over and underestimation of 10% as equivalent, whereas SMAPE considers the underestimation to be slightly "worse" than the overestimation.

Extending this to larger forecast errors:

{| class="wikitable" |- ! <math>A</math> !! <math>F</math> !! MAPE !! SMAPE |- | 100 || 200 || 100% || 66.67% |- | 100 || 50 || 50% || 66.67% |}

Here, ''double'' overestimation and ''half'' underestimation are treated equivalently by SMAPE, whereas MAPE considers the overestimation to be "twice as bad" as the underestimation.

Extending to an even more extreme case:

{| class="wikitable" |- ! <math>A</math> !! <math>F</math> !! MAPE !! SMAPE |- | 100 || 1,000 || 900% || 163.63% |- | 100 || 10 || 90% || 163.63% |}

Here it becomes clear that MAPE is unbounded from above, and can provide extremely large penalties for overestimations – but cannot do the same for extreme underestimations. SMAPE, on the other hand, is bounded between 0% and 200%, and penalises these larger over and underestimations in a more "symmetric" manner.

Therefore, the choice between MAPE and SMAPE depends entirely on the problem at hand, and whether or not a '''relative''' metric is more appropriate. This may be the case if the expected forecasting errors exceed <math>\gg10%</math>; for smaller errors, the MAPE is more frequently chosen, due to its simplicity and ease of interpretation.

=== Alternative Versions ===

As a "percentage error", SMAPE values between 0% and 100% can be considered easier to interpret, and an alternative formula is sometimes used in practice:

: <math> \text{SMAPE} = \frac{1}{n} \sum_{t=1}^n \frac{|F_t-A_t|}{|A_t|+|F_t|}</math>

There is also a third version of SMAPE, which allows measuring the direction of the bias in the data by generating a positive and a negative error on line item level. Furthermore, it is better protected against outliers and the bias effect{{clarify|date=October 2025}}. The formula is:

: <math> \text{SMAPE} = \frac{\sum_{t=1}^n \left|F_t-A_t\right|}{\sum_{t=1}^n (A_t+F_t)}</math>

== Alternatives ==

Provided the data are strictly positive, an alternative measure of relative accuracy can be obtained based on the log of the accuracy ratio: log(''F''''t'' / ''A''''t''). This measure is easier to analyze statistically and has valuable symmetry and unbiasedness properties. When used in constructing forecasting models, the resulting prediction corresponds to the geometric mean (Tofallis, 2015) {{clarify|date=October 2025}}, whereas ordinary least squares models predict the arithmetic mean. The geometric mean is less affected by outliers than the arithmetic mean.

==References== * Armstrong, J. S. (1985) Long-range Forecasting: From Crystal Ball to Computer, 2nd. ed. Wiley. {{ISBN|978-0-471-82260-8}} * Flores, B. E. (1986) "A pragmatic view of accuracy measurement in forecasting", Omega (Oxford), 14(2), 93–98. {{doi|10.1016/0305-0483(86)90013-7}} * Tofallis, C (2015) "A Better Measure of Relative Prediction Accuracy for Model Selection and Model Estimation", Journal of the Operational Research Society, 66(8),1352-1362. [https://ssrn.com/abstract=2635088 archived preprint]

==External links== * [http://robjhyndman.com/hyndsight/smape/ Rob J. Hyndman: Errors on Percentage Errors]

Category:Statistical deviation and dispersion