Ordinal data

{{Short description|Statistical data type}} {{distinguish|Ordinal data (programming)}} {{redirect|Ordinal scale|the film|Ordinal Scale}}

'''Ordinal data''' is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known.<ref name="agresti">{{cite book|last1=Agresti|first1=Alan|title=Categorical Data Analysis|date=2013|publisher=John Wiley & Sons|location=Hoboken, New Jersey|isbn=978-0-470-46363-5|edition=3}}</ref>{{rp|2}} These data exist on an '''ordinal scale''', one of four levels of measurement described by S. S. Stevens in 1946. The ordinal scale is distinguished from the ''nominal scale'' by having a ranking.<ref name=":0" /> It also differs from the ''interval scale'' and ''ratio scale'' by not having category widths that represent equal increments of the underlying attribute.<ref name="stevens">{{Cite journal|last=Stevens|first=S. S.|year=1946|title=On the Theory of Scales of Measurement|journal=Science|series=New Series|volume=103|issue=2684|pages=677–680|doi=10.1126/science.103.2684.677|pmid=17750512|bibcode=1946Sci...103..677S}}</ref>

==Examples of ordinal data==

A well-known example of ordinal data is the Likert scale. An example of a Likert scale is:<ref name="cohenetal">{{Cite book|title=Psychological Testing and Assessment: An Introduction to Tests and Measurement|last1=Cohen|first1=Ronald Jay|last2=Swerdik|first2=Mark E.|last3=Phillips|first3=Suzanne M.|publisher=Mayfield|year=1996|isbn=1-55934-427-X|edition=3rd|location=Mountain View, CA|pages=[https://archive.org/details/psychologicaltes0000cohe/page/685 685]|url=https://archive.org/details/psychologicaltes0000cohe/page/685}}</ref>{{rp|685}} {| class="wikitable" style="text-align: center;" !Like !Like Somewhat !Neutral !Dislike Somewhat !Dislike |- |1 |2 |3 |4 |5 |}Examples of ordinal data are often found in questionnaires: for example, the survey question "Is your general health poor, reasonable, good, or excellent?" may have those answers coded respectively as 1, 2, 3, and 4. Sometimes data on an interval scale or ratio scale are grouped onto an ordinal scale: for example, individuals whose income is known might be grouped into the income categories $0–$19,999, $20,000–$39,999, $40,000–$59,999, ..., which then might be coded as 1, 2, 3, 4, .... Other examples of ordinal data include socioeconomic status, military ranks, and letter grades for coursework.<ref name="s&c">{{Cite book|title=Nonparametric Statistics for the Behavioral Sciences|last1=Siegel|first1=Sidney|last2=Castellan|first2=N. John Jr.|publisher=McGraw-Hill|year=1988|isbn=0-07-057357-3|edition=2nd|location=Boston|pages=25–26}}</ref>

==Ways to analyse ordinal data==

Ordinal data analysis requires a different set of analyses than other qualitative variables. These methods incorporate the natural ordering of the variables in order to avoid loss of power.<ref name="agresti" />{{rp|88}} Computing the mean of a sample of ordinal data is discouraged; other measures of central tendency, including the median or mode, are generally more appropriate.<ref>{{cite journal|last1=Jamieson|first1=Susan|title=Likert scales: how to (ab)use them|journal=Medical Education|date=December 2004|volume=38|issue=12|pages=1212–1218|doi=10.1111/j.1365-2929.2004.02012.x|pmid=15566531|s2cid=42509064|url=http://eprints.gla.ac.uk/59552/1/59552.pdf }}</ref>

===General===

Stevens (1946) argued that, because the assumption of equal distance between categories does not hold for ordinal data, the use of means and standard deviations for description of ordinal distributions and of inferential statistics based on means and standard deviations was not appropriate. Instead, positional measures like the median and percentiles, in addition to descriptive statistics appropriate for nominal data (number of cases, mode, contingency correlation), should be used.<ref name="stevens" />{{rp|678}} Nonparametric methods have been proposed as the most appropriate procedures for inferential statistics involving ordinal data (e.g, Kendall's W, Spearman's rank correlation coefficient, etc.), especially those developed for the analysis of ranked measurements.<ref name="s&c" />{{rp|25–28}} However, the use of parametric statistics for ordinal data may be permissible with certain caveats to take advantage of the greater range of available statistical procedures.<ref>{{Cite web|url=ftp://ftp.sas.com/pub/neural/measurement.html|title=Measurement theory: Frequently asked questions|last=Sarle|first=Warren S.|archive-url=https://web.archive.org/web/20170705060825/ftp://ftp.sas.com/pub/neural/measurement.html|archive-date=2017-07-05|date=Sep 14, 1997}}</ref><ref>{{Cite book|title=Statistical Rules of Thumb|last=van Belle|first=Gerald|publisher=John Wiley & Sons|year=2002|isbn=0-471-40227-3|location=New York|pages=23–24}}</ref><ref name="cohenetal" />{{rp|90}}

===Univariate statistics=== In place of means and standard deviations, univariate statistics appropriate for ordinal data include the median,<ref name="blalock">{{Cite book|title=Social Statistics|last=Blalock|first=Hubert M. Jr.|publisher=McGraw-Hill|year=1979|isbn=0-07-005752-4|edition=Rev. 2nd|location=New York}}</ref>{{rp|59–61}} other percentiles (such as quartiles and deciles),<ref name="blalock" />{{rp|71}} and the quartile deviation.<ref name="blalock" />{{rp|77}} One-sample tests for ordinal data include the Kolmogorov-Smirnov one-sample test,<ref name="s&c" />{{rp|51–55}} the one-sample runs test,<ref name="s&c" />{{rp|58–64}} and the change-point test.<ref name="s&c" />{{rp|64–71}}

===Bivariate statistics===

In lieu of testing differences in means with ''t''-tests, differences in distributions of ordinal data from two independent samples can be tested with Mann-Whitney,<ref name="blalock" />{{rp|259–264}} runs,<ref name="blalock" />{{rp|253–259}} Smirnov,<ref name="blalock" />{{rp|266–269}} and signed-ranks<ref name="blalock" />{{rp|269–273}} tests. Test for two related or matched samples include the sign test<ref name="s&c" />{{rp|80–87}} and the Wilcoxon signed ranks test.<ref name="s&c" />{{rp|87–95}} Analysis of variance with ranks<ref name="blalock" />{{rp|367–369}} and the Jonckheere test for ordered alternatives<ref name="s&c" />{{rp|216–222}} can be conducted with ordinal data in place of independent samples ANOVA. Tests for more than two related samples includes the Friedman two-way analysis of variance by ranks<ref name="s&c" />{{rp|174–183}} and the Page test for ordered alternatives.<ref name="s&c" />{{rp|184–188}} Correlation measures appropriate for two ordinal-scaled variables include Kendall's tau,<ref name="blalock" />{{rp|436–439}} gamma,<ref name="blalock" />{{rp|442–443}} ''rs'',<ref name="blalock" />{{rp|434–436}} and ''dyx/dxy''.<ref name="blalock" />{{rp|443}}

===Regression applications=== Ordinal data can be considered as a quantitative variable. In logistic regression, the equation : <math> \operatorname{logit}[P(Y=1)] = \alpha + \beta_1 c + \beta_2 x </math> is the model and c takes on the assigned levels of the categorical scale.<ref name="agresti" />{{rp|189}} In regression analysis, outcomes (dependent variables) that are ordinal variables can be predicted using a variant of ordinal regression, such as ordered logit or ordered probit.

In multiple regression/correlation analysis, ordinal data can be accommodated using power polynomials and through normalization of scores and ranks.<ref>{{Cite book|title=Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences|last1=Cohen|first1=Jacob|last2=Cohen|first2=Patricia|publisher=Lawrence Erlbaum Associates|year=1983|isbn=0-89859-268-2|edition=2nd|location=Hillsdale, New Jersey|page=273}}</ref>

===Linear trends=== Linear trends are also used to find associations between ordinal data and other categorical variables, normally in a contingency tables. A correlation ''r'' is found between the variables where ''r'' lies between -1 and 1. To test the trend, a test statistic: : <math> M^2 = (n-1)r^2 </math> is used where ''n'' is the sample size.<ref name="agresti" />{{rp|87}}

''R'' can be found by letting <math> u_1 \leq u_2 \leq ... \leq u_I </math> be the row scores and <math> v_1 \leq v_2 \leq ... \leq v_I </math> be the column scores. Let <math> \bar u \ = \sum_i u_i p_{i+} </math> be the mean of the row scores while <math> \bar v \ = \sum_j v_j p_{j+}</math>. Then <math> p_{i+} </math> is the marginal row probability and <math> p_{+j} </math> is the marginal column probability. ''R'' is calculated by: : <math> r = \frac{ \sum_{i,j} \left (u_i - \bar u\ \right ) \left (v_j - \bar v\ \right )p_{ij}} {\sqrt{ \left \lbrack \sum_i ( u_i - \bar u\ \right )^2p_{i+} \rbrack \lbrack \sum_j ( v_j - \bar v\ )^2p_{+j} \rbrack }} </math>

===Classification methods=== Classification methods have also been developed for ordinal data. The data are divided into different categories such that each observation is similar to others. Dispersion is measured and minimized in each group to maximize classification results. The dispersion function is used in information theory.<ref>{{cite journal|last1=Laird|first1=Nan M.|title=A Note on Classifying Ordinal-Scale Data|journal=Sociological Methodology|date=1979|volume=10|pages=303–310|doi=10.2307/270775|jstor=270775}}</ref>

==Statistical models for ordinal data== There are several different models that can be used to describe the structure of ordinal data.<ref name="Agresti 2010">{{cite book |last=Agresti |first=Alan |title=Analysis of Ordinal Categorical Data |location=Hoboken, New Jersey |publisher=Wiley |edition=2nd |year=2010 |isbn=978-0-470-08289-8 }}</ref> Four major classes of model are described below, each defined for a random variable <math>Y</math>, with levels indexed by <math>k = 1, 2, \dots, q</math>.

Note that in the model definitions below, the values of <math>\mu_k</math> and <math>\mathbf{\beta}</math> will not be the same for all the models for the same set of data, but the notation is used to compare the structure of the different models.

===Proportional odds model=== The most commonly used model for ordinal data is the proportional odds model, defined by :<math> \log\left[\frac{\Pr(Y \leq k)}{Pr(Y > k)}\right] = \log\left[\frac{\Pr(Y \leq k)}{1-\Pr(Y \leq k)}\right] = \mu_k + \mathbf{\beta}^T\mathbf{x} </math> where the parameters <math>\mu_k</math> describe the base distribution of the ordinal data, <math>\mathbf{x}</math> are the covariates and <math>\mathbf{\beta}</math> are the coefficients describing the effects of the covariates.

This model can be generalized by defining the model using <math>\mu_k + \mathbf{\beta}_k^T\mathbf{x}</math> instead of <math>\mu_k + \mathbf{\beta}^T\mathbf{x}</math>, and this would make the model suitable for nominal data (in which the categories have no natural ordering) as well as ordinal data. However, this generalization can make it much more difficult to fit the model to the data.

===Baseline category logit model=== The baseline category model is defined by :<math> \log\left[\frac{\Pr(Y = k)}{\Pr(Y = 1)}\right] = \mu_k + \mathbf{\beta}_k^T\mathbf{x} </math>

This model does not impose an ordering on the categories and so can be applied to nominal data as well as ordinal data.

===Ordered stereotype model=== The ordered stereotype model is defined by :<math> \log\left[\frac{\Pr(Y = k)}{\Pr(Y = 1)}\right] = \mu_k + \phi_k\mathbf{\beta}^T\mathbf{x} </math> where the score parameters are constrained such that <math>0=\phi_1 \leq \phi_2 \leq \dots \leq \phi_q=1</math>.

This is a more parsimonious, and more specialised, model than the baseline category logit model: <math>\phi_k\mathbf{\beta}</math> can be thought of as similar to <math>\mathbf{\beta}_k</math>.

The non-ordered stereotype model has the same form as the ordered stereotype model, but without the ordering imposed on <math>\phi_k</math>. This model can be applied to nominal data.

Note that the fitted scores, <math>\hat{\phi}_k</math>, indicate how easy it is to distinguish between the different levels of <math>Y</math>. If <math>\hat{\phi}_k \approx \hat{\phi}_{k-1}</math> then that indicates that the current set of data for the covariates <math>\mathbf{x}</math> do not provide much information to distinguish between levels <math>k</math> and <math>k-1</math>, but that does '''not''' necessarily imply that the actual values <math>k</math> and <math>k-1</math> are far apart. And if the values of the covariates change, then for that new data the fitted scores <math>\hat{\phi}_k</math> and <math>\hat{\phi}_{k-1}</math> might then be far apart.

===Adjacent categories logit model=== The adjacent categories model is defined by :<math> \log\left[\frac{\Pr(Y = k)}{\Pr(Y = k+1)}\right] = \mu_k + \mathbf{\beta}_k^T\mathbf{x} </math> although the most common form, referred to in Agresti (2010)<ref name="Agresti 2010"/> as the "proportional odds form" is defined by :<math> \log\left[\frac{\Pr(Y = k)}{\Pr(Y = k+1)}\right] = \mu_k + \mathbf{\beta}^T\mathbf{x} </math>

This model can only be applied to ordinal data, since modelling the probabilities of shifts from one category to the next category implies that an ordering of those categories exists.

The adjacent categories logit model can be thought of as a special case of the baseline category logit model, where <math>\mathbf{\beta}_k = \mathbf{\beta}(k-1)</math>. The adjacent categories logit model can also be thought of as a special case of the ordered stereotype model, where <math>\phi_k \propto k-1</math>, i.e. the distances between the <math>\phi_k</math> are defined in advance, rather than being estimated based on the data.

===Comparisons between the models=== The proportional odds model has a very different structure to the other three models, and also a different underlying meaning. Note that the size of the reference category in the proportional odds model varies with <math>k</math>, since <math>Y \leq k</math> is compared to <math>Y > k</math>, whereas in the other models the size of the reference category remains fixed, as <math>Y=k</math> is compared to <math>Y=1</math> or <math>Y=k+1</math>.

===Different link functions=== There are variants of all the models that use different link functions, such as the probit link or the complementary log-log link.

== Statistical tests == Differences in ordinal data can be tested using rank tests.

==Visualization and display== Ordinal data can be visualized in several different ways. Common visualizations are the bar chart or a pie chart. Tables can also be useful for displaying ordinal data and frequencies. Mosaic plots can be used to show the relationship between an ordinal variable and a nominal or ordinal variable.<ref>{{Cite web|url=http://www-stat.wharton.upenn.edu/~buja/mba/plotting-techniques.html|title=Plotting Techniques}}</ref> A bump chart—a line chart that shows the relative ranking of items from one time point to the next—is also appropriate for ordinal data.<ref>{{Cite book|title=Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations|last=Berinato|first=Scott|publisher=Harvard Business Review Press|year=2016|isbn=978-1-63369-070-7|location=Boston|pages=228}}</ref>

Color or grayscale gradation can be used to represent the ordered nature of the data. A single-direction scale, such as income ranges, can be represented with a bar chart where increasing (or decreasing) saturation or lightness of a single color indicates higher (or lower) income. The ordinal distribution of a variable measured on a dual-direction scale, such as a Likert scale, could also be illustrated with color in a stacked bar chart. A neutral color (white or gray) might be used for the middle (zero or neutral) point, with contrasting colors used in the opposing directions from the midpoint, where increasing saturation or darkness of the colors could indicate categories at increasing distance from the midpoint.<ref>{{Cite book|title=Data Visualisation: A Handbook for Data Driven Design|last=Kirk|first=Andy|publisher=SAGE|year=2016|isbn=978-1-4739-1214-4|edition=1st|location=London|pages=269}}</ref> Choropleth maps also use color or grayscale shading to display ordinal data.<ref>{{Cite book|title=The Truthful Art: Data, Charts, and Maps for Communication|last=Cairo|first=Alberto|publisher=New Riders|year=2016|isbn=978-0-321-93407-9|edition=1st|location=San Francisco|pages=280}}</ref>

{|style="margin: 0 auto;" | thumb|Example bar plot of opinion on defense spending | thumb|Example bump plot of opinion on defense spending by political party | thumb|Example mosaic plot of opinion on defense spending by political party | thumb|Example stacked bar plot of opinion on defense spending by political party |}

==Applications==

The use of ordinal data can be found in most areas of research where categorical data are generated. Settings where ordinal data are often collected include the social and behavioral sciences and governmental and business settings where measurements are collected from persons by observation, testing, or questionnaires. Some common contexts for the collection of ordinal data include survey research;<ref>{{Cite book|chapter=Assessing the Reliability and Validity of Survey Measures|last=Alwin|first=Duane F.|title=Handbook of Survey Research|publisher=Emerald House|year=2010|isbn=978-1-84855-224-1|editor-last=Marsden|editor-first=Peter V.|location=Howard House, Wagon Lane, Bingley BD16 1WA, UK|page=420|editor-last2=Wright|editor-first2=James D.}}</ref><ref>{{Cite book|title=Improving Survey Questions: Design and Evaluation|last=Fowler|first=Floyd J. Jr.|publisher=Sage|year=1995|isbn=0-8039-4583-3|location=Thousand Oaks, CA|pages=[https://archive.org/details/improvingsurveyq00fowl/page/156 156–165]|url=https://archive.org/details/improvingsurveyq00fowl/page/156}}</ref> and intelligence, aptitude, personality testing and decision-making.<ref name=":0">{{Cite journal |last1=Ataei |first1=Younes |last2=Mahmoudi |first2=Amin |last3=Feylizadeh |first3=Mohammad Reza |last4=Li |first4=Deng-Feng |date=January 2020 |title=Ordinal Priority Approach (OPA) in Multiple Attribute Decision-Making |journal=Applied Soft Computing |volume=86 |article-number=105893 |doi=10.1016/j.asoc.2019.105893 |s2cid=209928171 |issn=1568-4946}}</ref><ref name="cohenetal" />{{rp|89–90}}

Calculation of 'Effect Size' (Cliff's Delta ''d'') using ordinal data has been recommended as a measure of statistical dominance.<ref>{{Cite journal |last=Cliff |first=Norman |date=November 1993 |title=Dominance statistics: Ordinal analyses to answer ordinal questions. |url=http://doi.apa.org/getdoi.cfm?doi=10.1037/0033-2909.114.3.494 |journal=Psychological Bulletin |language=en |volume=114 |issue=3 |pages=494–509 |doi=10.1037/0033-2909.114.3.494 |issn=1939-1455|url-access=subscription }}</ref>

==See also== {{Portal|Mathematics}} * List of analyses of categorical data * Ordinal Priority Approach * Ordinal number * Ordinal space

==References== {{reflist}}

==Further reading== * {{cite book |last=Agresti |first=Alan |title=Analysis of Ordinal Categorical Data |location=Hoboken, New Jersey |publisher=Wiley |edition=2nd |year=2010 |isbn=978-0-470-08289-8 }}

Category:Statistical data types Category:Comparison (mathematical)