# Box plot

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Box_plot
> Markdown URL: https://mediated.wiki/source/Box_plot.md
> Source: https://en.wikipedia.org/wiki/Box_plot
> Source revision: 1356238966
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Data visualization

Box plot of data from the [Michelson experiment](/source/Michelson%E2%80%93Morley_experiment#Michelson_experiment_(1881))

In [descriptive statistics](/source/Descriptive_statistics), a **box plot** or **boxplot** is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their [quartiles](/source/Quartile).[1]

A box plot representing data

In addition to the box on a box plot, there can be lines (which are called *whiskers*) extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also called the **box-and-whisker plot** and the **box-and-whisker diagram**. [Outliers](/source/Outlier) that differ significantly from the rest of the dataset[2] may be plotted as individual points beyond the whiskers on the box plot. Box plots are [non-parametric](/source/Non-parametric): they display variation in samples of a [statistical population](/source/Statistical_population) without making any assumptions of the underlying [statistical distribution](/source/Probability_distribution)[3] (though [Tukey](/source/John_Tukey)'s box plot assumes symmetry for the whiskers and normality for their length).

The spacings in each subsection of the box plot indicate the degree of [dispersion](/source/Statistical_dispersion) (spread) and [skewness](/source/Skewness) of the data, which are usually described using the [five-number summary](/source/Five-number_summary). In addition, the box plot allows one to visually estimate various [L-estimators](/source/L-estimator), notably the [interquartile range](/source/Interquartile_range), [midhinge](/source/Midhinge), [range](/source/Range_(statistics)), [mid-range](/source/Mid-range), and [trimean](/source/Trimean). Box plots can be drawn either horizontally or vertically.

## History

The range-bar method was first introduced by [Mary Eleanor Spear](/source/Mary_Eleanor_Spear) in her book "Charting Statistics" in 1952[4] and again in her book "Practical Charting Techniques" in 1969.[5] The box-and-whisker plot was first introduced in 1970 by [John Tukey](/source/John_Tukey), who later published on the subject in his book "Exploratory Data Analysis" in 1977.[6]

## Elements

Box plot with whiskers from minimum to maximum

The same box plot with whiskers drawn within the 1.5 IQR value

A box plot is a standardized way of displaying the dataset based on the [five-number summary](/source/Five-number_summary): the minimum, the maximum, the sample median, and the first and third quartiles.

- **[Minimum](/source/Sample_minimum) (*Q*0 or 0th [percentile](/source/Percentile))**: the lowest data point in the data set excluding any outliers

- **[Maximum](/source/Sample_maximum) (*Q*4 or 100th percentile)**: the highest data point in the data set excluding any outliers

- **[Median](/source/Median) (*Q*2 or 50th percentile)**: the middle value in the data set

- **[First quartile](/source/First_quartile) (*Q*1 or 25th percentile)**: also known as the *lower quartile* *q**n*(0.25), it is the median of the lower half of the dataset

- **[Third quartile](/source/Third_quartile) (*Q*3 or 75th percentile)**: also known as the *upper quartile* *q**n*(0.75), it is the median of the upper half of the dataset[7]

In addition to the minimum and maximum values used to construct a box plot, another important element that can also be employed to obtain a box plot is the interquartile range (IQR), as denoted below:

- **[Interquartile range](/source/Interquartile_range) (IQR)**: the distance between the upper and lower quartiles

- - IQR = Q 3 − Q 1 = q n ( 0.75 ) − q n ( 0.25 ) {\displaystyle {\text{IQR}}=Q_{3}-Q_{1}=q_{n}(0.75)-q_{n}(0.25)}

A box plot usually includes two parts, a box and a set of whiskers.

### Box

The box is drawn from *Q*1 to *Q*3 with a horizontal line drawn inside it to denote the median. Some box plots include an additional character to represent the mean of the data.[8][9]

### Whiskers

The whiskers must end at an observed data point, but can be defined in various ways. In the most straightforward method, the boundary of the lower whisker is the minimum value of the data set, and the boundary of the upper whisker is the maximum value of the data set. Because of this variability, it is appropriate to describe the convention that is being used for the whiskers and outliers in the caption of the box plot.

Another popular choice for the boundaries of the whiskers is based on the 1.5 IQR value. From above the upper quartile (***Q*3**), a distance of 1.5 times the IQR is measured out and a whisker is drawn *up to* the largest observed data point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile (***Q*1**) and a whisker is drawn *down to* the lowest observed data point from the dataset that falls within this distance. Because the whiskers must end at an observed data point, the whisker lengths can look unequal, even though 1.5 IQR is the same for both sides. All other observed data points outside the boundary of the whiskers are plotted as **outliers**.[10] The outliers can be plotted on the box plot as a dot, a small circle, a star, *etc.* (see example below).

There are other representations in which the whiskers can stand for several other things, such as:

- One [standard deviation](/source/Standard_deviation) above and below the mean of the data set

- The 9th percentile and the 91st percentile of the data set

- The 2nd percentile and the 98th percentile of the data set

Rarely, box plot can be plotted without the whiskers. This can be appropriate for sensitive information to avoid whiskers (and outliers) disclosing actual values observed.[11]

The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for whisker cross-hatches and whisker ends to depict the [seven-number summary](/source/Seven-number_summary). If the data are [normally distributed](/source/Normal_distribution), the locations of the seven marks on the box plot will be equally spaced. On some box plots, a cross-hatch is placed before the end of each whisker.

## Variations

Four box plots, with and without notches and variable width

Since the mathematician [John W. Tukey](/source/John_W._Tukey) first popularized this type of visual data display in 1969, several variations on the classical box plot have been developed, and the two most commonly found variations are the variable-width box plots and the notched box plots.

**Variable-width box** plots illustrate the size of each group whose data is being plotted by making the width of the box proportional to the size of the group. A popular convention is to make the box width proportional to the square root of the size of the group.[12]

**Notched box** plots apply a "notch" or narrowing of the box around the median. Notches are useful in offering a rough guide of the significance of the difference of medians; if the notches of two boxes do not overlap, this will provide evidence of a statistically significant difference between the medians. The height of the notches is proportional to the interquartile range (IQR) of the sample and is inversely proportional to the square root of the size of the sample. However, there is an uncertainty about the most appropriate multiplier (as this may vary depending on the similarity of the variances of the samples).[12] The width of the notch is arbitrarily chosen to be visually pleasing, and should be consistent amongst all box plots being displayed on the same page.

One convention for obtaining the boundaries of these notches is to use a distance of ± 1.58 IQR n {\displaystyle \pm {\frac {1.58{\text{ IQR}}}{\sqrt {n}}}} around the median.[13]

**Adjusted box** plots are intended to describe [skew distributions](/source/Skewness), and they rely on the [medcouple](/source/Medcouple) statistic of skewness.[14] For a medcouple value of MC, the lengths of the upper and lower whiskers on the box plot are respectively defined to be:

- 1.5 IQR ⋅ e 3 MC , 1.5 IQR ⋅ e − 4 MC if MC ≥ 0 , 1.5 IQR ⋅ e 4 MC , 1.5 IQR ⋅ e − 3 MC if MC ≤ 0. {\displaystyle {\begin{matrix}1.5{\text{IQR}}\cdot e^{3{\text{MC}}},&1.5{\text{ IQR}}\cdot e^{-4{\text{MC}}}{\text{ if }}{\text{MC}}\geq 0,\\1.5{\text{IQR}}\cdot e^{4{\text{MC}}},&1.5{\text{ IQR}}\cdot e^{-3{\text{MC}}}{\text{ if }}{\text{MC}}\leq 0.\end{matrix}}}

For a symmetrical data distribution, the medcouple will be zero, and this reduces the adjusted box plot to the Tukey's box plot with equal whisker lengths of 1.5 IQR {\displaystyle 1.5{\text{ IQR}}} for both whiskers.

**Other kinds of box plots**, such as the [violin plots](/source/Violin_plot) and the bean plots can show the difference between single-modal and [multimodal distributions](/source/Multimodal_distribution), which cannot be observed from the original classical box plot.[6]

## Examples

### Example without outliers

A box plot with no outliers

A series of hourly temperatures were measured throughout the day in degrees Fahrenheit. The recorded values are listed in order as follows (°F): 57, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81.

A box plot of the data set can be generated by first calculating five relevant values of this data set: minimum, maximum, median (***Q*2**), first quartile (***Q*1**), and third quartile (***Q*3**).

The minimum is the smallest number of the data set. In this case, the minimum recorded day temperature is 57°F.

The maximum is the largest number of the data set. In this case, the maximum recorded day temperature is 81°F.

The median is the "middle" number of the ordered data set. This means that exactly 50% of the elements are below the median and 50% of the elements are greater than the median. The median of this ordered data set is 70°F.

The first quartile value (***Q*1** **or 25th percentile)** is the number that marks one quarter of the ordered data set. In other words, there are exactly 25% of the elements that are less than the first quartile and exactly 75% of the elements that are greater than it. The first quartile value can be easily determined by finding the "middle" number between the minimum and the median. For the hourly temperatures, the "middle" number found between 57°F and 70°F is 66°F.

The third quartile value (***Q*3** **or 75th percentile)** is the number that marks three quarters of the ordered data set. In other words, there are exactly 75% of the elements that are less than the third quartile and 25% of the elements that are greater than it. The third quartile value can be easily obtained by finding the "middle" number between the median and the maximum. For the hourly temperatures, the "middle" number between 70°F and 81°F is 75°F.

The interquartile range, or IQR, can be calculated by subtracting the first quartile value (***Q*1**) from the third quartile value (***Q*3**):

- IQR = Q 3 − Q 1 = 75 ∘ F − 66 ∘ F = 9 ∘ F . {\displaystyle {\text{IQR}}=Q_{3}-Q_{1}=75^{\circ }F-66^{\circ }F=9^{\circ }F.}

Hence, 1.5 IQR = 1.5 ⋅ 9 ∘ F = 13.5 ∘ F . {\displaystyle 1.5{\text{IQR}}=1.5\cdot 9^{\circ }F=13.5^{\circ }F.}

1.5 IQR above the third quartile is:

- Q 3 + 1.5 IQR = 75 ∘ F + 13.5 ∘ F = 88.5 ∘ F . {\displaystyle Q_{3}+1.5{\text{ IQR}}=75^{\circ }F+13.5^{\circ }F=88.5^{\circ }F.}

1.5 IQR below the first quartile is:

- Q 1 − 1.5 IQR = 66 ∘ F − 13.5 ∘ F = 52.5 ∘ F . {\displaystyle Q_{1}-1.5{\text{ IQR}}=66^{\circ }F-13.5^{\circ }F=52.5^{\circ }F.}

The upper whisker boundary of the box plot is the largest data value that is within 1.5 IQR above the third quartile. Here, 1.5 IQR above the third quartile is 88.5°F and the maximum is 81°F. Therefore, the upper whisker is drawn at the value of the maximum, which is 81°F.

Similarly, the lower whisker boundary of the box plot is the smallest data value that is within 1.5 IQR below the first quartile. Here, 1.5 IQR below the first quartile is 52.5°F and the minimum is 57°F. Therefore, the lower whisker is drawn at the value of the minimum, which is 57°F.

### Example with outliers

A box plot with outliers

Above is an example without outliers. Here is a follow-up example for generating box plot with outliers:

The ordered set for the recorded temperatures is (°F): 52, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 89.

In this example, only the first and the last number are changed. The median, third quartile, and first quartile remain the same.

In this case, the maximum value in this data set is 89°F, and 1.5 IQR above the third quartile is 88.5°F. The maximum is greater than 1.5 IQR plus the third quartile, so the maximum is an outlier. Therefore, the upper whisker is drawn at the greatest value smaller than 1.5 IQR above the third quartile, which is 79°F.

Similarly, the minimum value in this data set is 52°F, and 1.5 IQR below the first quartile is 52.5°F. The minimum is smaller than 1.5 IQR minus the first quartile, so the minimum is also an outlier. Therefore, the lower whisker is drawn at the smallest value greater than 1.5 IQR below the first quartile, which is 57°F.

### In the case of large datasets

An additional example for obtaining box plot from a data set containing a large number of data points is:

#### General equation to compute empirical quantiles

- q n ( p ) = x ( k ) + α ( x ( k + 1 ) − x ( k ) ) {\displaystyle q_{n}(p)=x_{(k)}+\alpha (x_{(k+1)}-x_{(k)})}

- with k = [ p ( n + 1 ) ] and α = p ( n + 1 ) − k {\displaystyle {\text{with }}k=[p(n+1)]{\text{ and }}\alpha =p(n+1)-k}

- Here x ( k ) {\displaystyle x_{(k)}} stands for the general ordering of the data points (i.e. if i < k {\displaystyle i<k} , then x ( i ) < x ( k ) {\displaystyle x_{(i)}<x_{(k)}} )

Using the above example that has 24 data points (*n* = 24), one can calculate the median, first and third quartile either mathematically or visually.

**Median**

- q n ( 0.5 ) = x ( 12 ) + ( 0.5 ⋅ 25 − 12 ) ⋅ ( x ( 13 ) − x ( 12 ) ) = 70 + ( 0.5 ⋅ 25 − 12 ) ⋅ ( 70 − 70 ) = 70 ∘ F {\displaystyle {\begin{aligned}q_{n}(0.5)&=x_{(12)}+(0.5\cdot 25-12)\cdot (x_{(13)}-x_{(12)})\\[5pt]&=70+(0.5\cdot 25-12)\cdot (70-70)=70^{\circ }{\text{F}}\end{aligned}}}

**First quartile**

- q n ( 0.25 ) = x ( 6 ) + ( 0.25 ⋅ 25 − 6 ) ⋅ ( x ( 7 ) − x ( 6 ) ) = 66 + ( 0.25 ⋅ 25 − 6 ) ⋅ ( 66 − 66 ) = 66 ∘ F {\displaystyle {\begin{aligned}q_{n}(0.25)&=x_{(6)}+(0.25\cdot 25-6)\cdot (x_{(7)}-x_{(6)})\\[5pt]&=66+(0.25\cdot 25-6)\cdot (66-66)=66^{\circ }{\text{F}}\end{aligned}}}

**Third quartile**

- q n ( 0.75 ) = x ( 18 ) + ( 0.75 ⋅ 25 − 18 ) ⋅ ( x ( 19 ) − x ( 18 ) ) = 75 + ( 0.75 ⋅ 25 − 18 ) ⋅ ( 75 − 75 ) = 75 ∘ F {\displaystyle {\begin{aligned}q_{n}(0.75)&=x_{(18)}+(0.75\cdot 25-18)\cdot (x_{(19)}-x_{(18)})\\[5pt]&=75+(0.75\cdot 25-18)\cdot (75-75)=75^{\circ }{\text{F}}\end{aligned}}}

Box plot and a [probability density function](/source/Probability_density_function) (pdf) of a Normal N(0,1σ2) Population

Box plots displaying the skewness of the data set

- [mathematics portal](https://en.wikipedia.org/wiki/Portal:Mathematics)

## Visualization

Although box plots may seem more primitive than [histograms](/source/Histogram) or [kernel density estimates](/source/Kernel_density_estimation), they do have a number of advantages. First, the box plot enables statisticians to do a quick graphical examination on one or more data sets. Box plots also take up less space and are therefore particularly useful for comparing distributions between several groups or sets of data in parallel. Lastly, the overall structure of histograms and kernel density estimate can be strongly influenced by the choice of [number and width of bins](/source/Histogram#Number_of_bins_and_width) techniques and the choice of bandwidth, respectively.

Although looking at a statistical distribution is more common than looking at a box plot, it can be useful to compare the box plot against the probability density function (theoretical histogram) for a normal N(0,*σ*2) distribution and observe their characteristics directly.

## See also

- [Bagplot](/source/Bagplot)

- [Contour boxplot](/source/Contour_boxplot)

- [Data and information visualization](/source/Data_and_information_visualization)

- [Exploratory data analysis](/source/Exploratory_data_analysis)

- [Fan chart](/source/Fan_chart_(statistics))

- [Five-number summary](/source/Five-number_summary)

- [Functional boxplot](/source/Functional_boxplot)

- [Seasonality](/source/Seasonality)

- [Seven-number summary](/source/Seven-number_summary)

- [Sina plot](/source/Sina_plot)

- [Violin plot](/source/Violin_plot)

## References

1. **[^](#cite_ref-1)** C., Dutoit, S. H. (2012). [*Graphical exploratory data analysis*](http://worldcat.org/oclc/1019645745). Springer. [ISBN](/source/ISBN_(identifier)) [978-1-4612-9371-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4612-9371-2). [OCLC](/source/OCLC_(identifier)) [1019645745](https://search.worldcat.org/oclc/1019645745).{{[cite book](https://en.wikipedia.org/wiki/Template:Cite_book)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

1. **[^](#cite_ref-2)** Grubbs, Frank E. (February 1969). ["Procedures for Detecting Outlying Observations in Samples"](https://dx.doi.org/10.1080/00401706.1969.10490657). *Technometrics*. **11** (1): 1–21. [doi](/source/Doi_(identifier)):[10.1080/00401706.1969.10490657](https://doi.org/10.1080%2F00401706.1969.10490657). [ISSN](/source/ISSN_(identifier)) [0040-1706](https://search.worldcat.org/issn/0040-1706).

1. **[^](#cite_ref-3)** Richard., Boddy (2009). [*Statistical Methods in Practice : for Scientists and Technologists*](http://worldcat.org/oclc/940679163). John Wiley & Sons. [ISBN](/source/ISBN_(identifier)) [978-0-470-74664-6](https://en.wikipedia.org/wiki/Special:BookSources/978-0-470-74664-6). [OCLC](/source/OCLC_(identifier)) [940679163](https://search.worldcat.org/oclc/940679163).

1. **[^](#cite_ref-4)** Spear, Mary Eleanor (2024). *Charting Statistics*. McGraw Hill. p. 166.

1. **[^](#cite_ref-5)** Spear, Mary Eleanor. (1969). *Practical charting techniques*. New York: McGraw-Hill. [ISBN](/source/ISBN_(identifier)) [0070600104](https://en.wikipedia.org/wiki/Special:BookSources/0070600104). [OCLC](/source/OCLC_(identifier)) [924909765](https://search.worldcat.org/oclc/924909765).

1. ^ [***a***](#cite_ref-:0_6-0) [***b***](#cite_ref-:0_6-1) Wickham, Hadley; Stryjewski, Lisa. ["40 years of boxplots"](https://vita.had.co.nz/papers/boxplots.pdf) (PDF). Retrieved December 24, 2020.

1. **[^](#cite_ref-7)** Holmes, Alexander; Illowsky, Barbara; Dean, Susan (31 March 2015). ["Introductory Business Statistics"](https://web.archive.org/web/20200727025431/https://opentextbc.ca/introbusinessstatopenstax/chapter/measures-of-the-location-of-the-data/). *OpenStax*. Archived from [the original](https://opentextbc.ca/introbusinessstatopenstax/chapter/measures-of-the-location-of-the-data/) on 27 July 2020. Retrieved 29 April 2020.

1. **[^](#cite_ref-frigge_hoaglin_iglewicz2_8-0)** Frigge, Michael; Hoaglin, David C.; Iglewicz, Boris (February 1989). "Some Implementations of the Boxplot". *[The American Statistician](/source/The_American_Statistician)*. **43** (1): 50–54. [doi](/source/Doi_(identifier)):[10.2307/2685173](https://doi.org/10.2307%2F2685173). [JSTOR](/source/JSTOR_(identifier)) [2685173](https://www.jstor.org/stable/2685173).

1. **[^](#cite_ref-9)** Marmolejo-Ramos, F.; Tian, S. (2010). ["The shifting boxplot. A box plot based on essential summary statistics around the mean"](https://doi.org/10.21500%2F20112084.823). *International Journal of Psychological Research*. **3** (1): 37–46. [doi](/source/Doi_(identifier)):[10.21500/20112084.823](https://doi.org/10.21500%2F20112084.823). [hdl](/source/Hdl_(identifier)):[10819/6492](https://hdl.handle.net/10819%2F6492).

1. **[^](#cite_ref-10)** Dekking, F.M. (2005). [*A Modern Introduction to Probability and Statistics*](https://archive.org/details/modernintroducti00dekk_722). Springer. pp. [234](https://archive.org/details/modernintroducti00dekk_722/page/n240)–238. [ISBN](/source/ISBN_(identifier)) [1-85233-896-2](https://en.wikipedia.org/wiki/Special:BookSources/1-85233-896-2).

1. **[^](#cite_ref-DGRW_11-0)** Derrick, Ben; Green, Elizabeth; Ritchie, Felix; White, Paul (September 2022). "The Risk of Disclosure When Reporting Commonly Used Univariate Statistics". *Privacy in Statistical Databases*. Lecture Notes in Computer Science. Vol. 13463. pp. 119–129. [doi](/source/Doi_(identifier)):[10.1007/978-3-031-13945-1_9](https://doi.org/10.1007%2F978-3-031-13945-1_9). [ISBN](/source/ISBN_(identifier)) [978-3-031-13944-4](https://en.wikipedia.org/wiki/Special:BookSources/978-3-031-13944-4).

1. ^ [***a***](#cite_ref-mcgill_tukey_larsen_12-0) [***b***](#cite_ref-mcgill_tukey_larsen_12-1) McGill, Robert; [Tukey, John W.](/source/John_W._Tukey); Larsen, Wayne A. (February 1978). "Variations of Box Plots". *[The American Statistician](/source/The_American_Statistician)*. **32** (1): 12–16. [doi](/source/Doi_(identifier)):[10.2307/2683468](https://doi.org/10.2307%2F2683468). [JSTOR](/source/JSTOR_(identifier)) [2683468](https://www.jstor.org/stable/2683468).

1. **[^](#cite_ref-Rboxplotstats_13-0)** ["R: Box Plot Statistics"](http://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/boxplot.stats.html). *R manual*. Retrieved 26 June 2011.

1. **[^](#cite_ref-Hubert2008_14-0)** [Hubert, M.](/source/Mia_Hubert); Vandervieren, E. (2008). "An adjusted box plot for skewed distribution". *Computational Statistics and Data Analysis*. **52** (12): 5186–5201. [CiteSeerX](/source/CiteSeerX_(identifier)) [10.1.1.90.9812](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.9812). [doi](/source/Doi_(identifier)):[10.1016/j.csda.2007.11.008](https://doi.org/10.1016%2Fj.csda.2007.11.008).

## Further reading

- [Tukey, John W.](/source/John_Tukey) (1977). [*Exploratory Data Analysis*](https://archive.org/details/exploratorydataa00tuke_0). [Addison-Wesley](/source/Addison-Wesley). [ISBN](/source/ISBN_(identifier)) [9780201076165](https://en.wikipedia.org/wiki/Special:BookSources/9780201076165).

- Benjamini, Y. (1988). "Opening the Box of a Boxplot". *The American Statistician*. **42** (4): 257–262. [doi](/source/Doi_(identifier)):[10.2307/2685133](https://doi.org/10.2307%2F2685133). [JSTOR](/source/JSTOR_(identifier)) [2685133](https://www.jstor.org/stable/2685133).

- [Rousseeuw, P. J.](/source/Peter_Rousseeuw); Ruts, I.; [Tukey, J. W.](/source/John_Tukey) (1999). "The Bagplot: A Bivariate Boxplot". *The American Statistician*. **53** (4): 382–387. [doi](/source/Doi_(identifier)):[10.2307/2686061](https://doi.org/10.2307%2F2686061). [JSTOR](/source/JSTOR_(identifier)) [2686061](https://www.jstor.org/stable/2686061).

## External links

Wikimedia Commons has media related to [Box plots](https://commons.wikimedia.org/wiki/Category:Box_plots).

- [Beeswarm Boxplot](http://www.r-statistics.com/2011/03/beeswarm-boxplot-and-plotting-it-with-r/) - superimposing a frequency-jittered stripchart on top of a box plot

v t e Statistics Outline Index Descriptive statistics Continuous data Center Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode Dispersion Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance Shape Central limit theorem Moments Kurtosis L-moments Skewness Count data Index of dispersion Summary tables Contingency table Frequency distribution Grouped data Dependence Partial correlation Pearson product-moment correlation Rank correlation Kendall's τ Spearman's ρ Scatter plot Graphics Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Radar chart Run chart Scatter plot Stem-and-leaf display Violin plot Heatmap Scatter Plot Matrix ECDF plot Line chart Statistical data processing Transformations Data transformation Log transformation Power transform Box–Cox transformation Yeo–Johnson transformation Variance-stabilizing transformation Anscombe transform Fisher transformation Scaling and normalization Feature scaling Normalization Standardization (z-score) Min–max normalization Unit vector normalization Data cleaning Data cleaning Outlier Winsorizing Truncation Missing data Data reduction Dimensionality reduction Principal component analysis Factor analysis Time-series preprocessing Differencing Detrending Seasonal adjustment Stationarity transformation Data collection Study design Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power Survey methodology Sampling Cluster Stratified Opinion poll Questionnaire Standard error Controlled experiments Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control Adaptive designs Adaptive clinical trial Stochastic approximation Up-and-down designs Observational studies Cohort study Cross-sectional study Natural experiment Quasi-experiment Statistical inference Statistical theory Population Statistic Probability distribution Sampling distribution Order statistic Empirical distribution Density estimation Statistical model Model specification Lp space Parameter location scale shape Parametric family Likelihood (monotone) Location–scale family Exponential family Completeness Sufficiency Statistical functional Bootstrap U V Optimal decision loss function Efficiency Statistical distance divergence Asymptotics Robustness Frequentist inference Point estimation Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in Interval estimation Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife Testing hypotheses 1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons Parametric tests Likelihood-ratio Score/Lagrange multiplier Wald Specific tests Z-test (normal) Student's t-test F-test Goodness of fit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC Rank statistics Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test Bayesian inference Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator Correlation Regression analysis Correlation Pearson product-moment Partial correlation Confounding variable Coefficient of determination Regression analysis Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS) Template:Least squares and regression analysis Linear regression Simple linear regression Ordinary least squares General linear model Bayesian regression Non-standard predictors Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity Generalized linear model Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions Partition of variance Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom Categorical / multivariate / time-series / survival analysis Categorical Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test Cochran–Mantel–Haenszel statistics Multivariate Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal Time-series General Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality Specific tests Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey Time domain Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR)) Frequency domain Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood Survival Survival function Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time Hazard function Nelson–Aalen estimator Test Log-rank test Applications Biostatistics Bioinformatics Clinical trials / studies Epidemiology Medical statistics Engineering statistics Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification Social statistics Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics Spatial statistics Cartography Environmental statistics Geographic information system Geostatistics Kriging Category Mathematics portal Commons WikiProject

---
Adapted from the Wikipedia article [Box plot](https://en.wikipedia.org/wiki/Box_plot) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Box_plot?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
