# Scatter plot

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Scatter_plot
> Markdown URL: https://mediated.wiki/source/Scatter_plot.md
> Source: https://en.wikipedia.org/wiki/Scatter_plot
> Source revision: 1353646712
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Plot using the dispersal of scattered dots to show the relationship between variables

Not to be confused with [Correlogram](/source/Correlogram) or [Scatter matrix](/source/Scatter_matrix).

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Scatter plot" – news · newspapers · books · scholar · JSTOR (April 2024) (Learn how and when to remove this message)

An example scatter diagram plotting a quality characteristic against a given input

A **scatter plot**, also called a **scatterplot**, **scatter graph**, **scatter chart**, **scattergram**, or **scatter diagram**,[1] is a type of [plot](/source/Plot_(graphics)) or [mathematical diagram](/source/Mathematical_diagram) using [Cartesian coordinates](/source/Cartesian_coordinate_system) to display values for typically two [variables](/source/Variable_(mathematics)) for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the [vertical axis](/source/Vertical_axis).[2] The scatter diagram is one of the [seven basic tools of quality](/source/Seven_basic_tools_of_quality) control.

## History

See also: [Data and information visualization § History](/source/Data_and_information_visualization#History)

According to Michael Friendly and Daniel Denis, the defining characteristic distinguishing scatter plots from line charts is the representation of specific observations of bivariate data where one variable is plotted on the horizontal axis and the other on the vertical axis. The two variables are often abstracted from a physical representation like the spread of bullets on a target or a geographic or celestial projection.[3][4]

While [Edmund Halley](/source/Edmund_Halley) created a bivariate plot of temperature and pressure in 1686, he omitted the specific data points used to demonstrate the relationship. Friendly and Denis claim his visualization was different from an actual scatter plot. Friendly and Denis attribute the first scatter plot to [John Herschel](/source/John_Herschel). In 1833, Herschel plotted the angle between the central star in the constellation Virgo and [Gamma Virginis](/source/Gamma_Virginis) over time to find how the angle changes over time, not through calculation but with freehand drawing and human judgment.[3]

[Sir Francis Galton](/source/Sir_Francis_Galton) extended and popularized the scatter plot and many other statistical tools to pursue a scientific basis for eugenics.[5] When, in 1886, Galton published a scatter plot and correlation ellipse of the height of parents and children, he extended Herschel's mere plotting of data points by binning and averaging adjacent cells to create a smoother visualization.[3] Karl Pearson, R. A. Fischer, and other statisticians and [eugenicists](/source/Eugenicists) built on Galton's work and formalized correlations and significance testing.[5]

## Overview

Waiting time between eruptions and the duration of the eruption for the [Old Faithful Geyser](/source/Old_Faithful_Geyser) in [Yellowstone National Park](/source/Yellowstone_National_Park), [Wyoming](/source/Wyoming), USA. This chart suggests there are generally two types of eruptions: short-wait-short-duration, and long-wait-long-duration.

A 3D scatter plot allows the visualization of multivariate data. This scatter plot takes multiple scalar variables and uses them for different axes in phase space. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and coloured using another scalar variable.[6]

A scatter plot can be used either when one continuous variable is under the control of the experimenter and the other depends on it or when both continuous variables are independent. If a [parameter](/source/Parameter) exists that is systematically incremented and/or decremented by the other, it is called the *control parameter* or [independent variable](/source/Independent_variable) and is customarily plotted along the horizontal axis. The measured or [dependent variable](/source/Dependent_variable) is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of [correlation](/source/Correlation) (not [causation](/source/Causality)) between two variables.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

A scatter plot can suggest various kinds of correlations between variables with a certain [confidence interval](/source/Confidence_interval). For example, weight and height would be on the y-axis, and height would be on the x-axis. Correlations may be positive (rising), negative (falling), or null (uncorrelated). If the dots' pattern slopes from lower left to upper right, it indicates a positive [correlation](/source/Correlation) between the variables being studied. If the pattern of dots slopes from upper left to lower right, it indicates a negative correlation. A line of [best fit](/source/Curve_fitting) (alternatively called 'trendline') can be drawn to study the relationship between the variables. An equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the best-fit procedure is known as [linear regression](/source/Linear_regression) and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very useful when we wish to see how two comparable data sets agree to show nonlinear relationships between variables. The ability to do this can be enhanced by adding a smooth line such as [LOESS](/source/Local_regression).[7] Furthermore, if the data are represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

The scatter diagram is one of the [seven basic tools](/source/Seven_Basic_Tools_of_Quality) of [quality control](/source/Quality_control).[8]

Scatter charts can be built in the form of [bubble](/source/Bubble_chart), marker, or/and [line charts](/source/Line_chart).[9]

## Example

Scatterplot showing the relationship between distance to stop for cars driven at various speeds (n = 50).

For example, to display a link between a person's lung capacity, and how long that person could hold their breath, a researcher would choose a group of people to study, then measure each one's lung capacity (first variable) and how long that person could hold their breath (second variable). The researcher would then plot the data in a scatter plot, assigning "lung capacity" to the horizontal axis, and "time holding breath" to the vertical axis.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

A person with a lung capacity of 400 [cl](/source/Centilitre) who held their breath for 21.7 s would be represented by a single dot on the scatter plot at the point (400, 21.7) in the [Cartesian coordinates](/source/Cartesian_coordinate_system). The scatter plot of all the people in the study would enable the researcher to obtain a visual comparison of the two variables in the data set and will help to determine what kind of relationship there might be between the two variables.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

## Scatter plot matrices

For a set of data variables (dimensions) *X*1, *X*2, ... , *X**k*, the scatter plot matrix shows all the pairwise scatter plots of the variables on a single view with multiple scatterplots in a matrix format. For k variables, the scatterplot matrix will contain k rows and k columns. A plot located on the intersection of ith row and jth column is a plot of variables *X**i* versus *X**j*.[10] This means that each row and column is one dimension, and each cell plots a scatter plot of two dimensions.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

A **generalized scatter plot matrix**[11] offers a range of displays of paired combinations of categorical and quantitative variables. A [mosaic plot](/source/Mosaic_plot), fluctuation diagram, or faceted [bar chart](/source/Bar_chart) may be used to display two categorical variables. Other plots are used for one categorical and one quantitative variables.

Visualization of 3D data along with the correspondent scatterplot matrix

## See also

- [Data and information visualization](/source/Data_and_information_visualization)

- [Rug plot](/source/Rug_plot)

- [Bar graph](/source/Bar_graph)

- [Line chart](/source/Line_chart)

- [List of mathematical art software](/source/List_of_mathematical_art_software)

- [Scagnostics](/source/Scagnostics)

- [Dot plot (statistics)](/source/Dot_plot_(statistics))

- [Parity plot](/source/Parity_plot)

## References

1. **[^](#cite_ref-1)** Jarrell, Stephen B. (1994). *Basic Statistics* (Special pre-publication ed.). Dubuque, Iowa: Wm. C. Brown Pub. p. 492. [ISBN](/source/ISBN_(identifier)) [978-0-697-21595-6](https://en.wikipedia.org/wiki/Special:BookSources/978-0-697-21595-6). When we search for a relationship between two quantitative variables, a standard graph of the available data pairs (X,Y), called a *scatter diagram*, frequently helps...

1. **[^](#cite_ref-2)** Utts, Jessica M. *Seeing Through Statistics* 3rd Edition, Thomson Brooks/Cole, 2005, pp 166-167. [ISBN](/source/ISBN_(identifier)) [0-534-39402-7](https://en.wikipedia.org/wiki/Special:BookSources/0-534-39402-7)

1. ^ [***a***](#cite_ref-:0_3-0) [***b***](#cite_ref-:0_3-1) [***c***](#cite_ref-:0_3-2) Friendly, Michael; Denis, Dan (2005). "The early origins and development of the scatterplot". *Journal of the History of the Behavioral Sciences*. **41** (2): 103–130. [doi](/source/Doi_(identifier)):[10.1002/jhbs.20078](https://doi.org/10.1002%2Fjhbs.20078). [PMID](/source/PMID_(identifier)) [15812820](https://pubmed.ncbi.nlm.nih.gov/15812820).

1. **[^](#cite_ref-4)** ["The early origins and development of the scatterplot"](https://www.datavis.ca/papers/friendly-scat.pdf) (PDF). [Archived](https://web.archive.org/web/20100613104833/http://www.datavis.ca/papers/friendly-scat.pdf) (PDF) from the original on 2010-06-13. Retrieved 2024-06-12.

1. ^ [***a***](#cite_ref-:1_5-0) [***b***](#cite_ref-:1_5-1) Louçã, Francisco (2009). ["Emancipation Through Interaction — How Eugenics and Statistics Converged and Diverged"](http://www.jstor.org/stable/25650625). *Journal of the History of Biology*. **42** (4): 649–684. [doi](/source/Doi_(identifier)):[10.1007/s10739-008-9167-7](https://doi.org/10.1007%2Fs10739-008-9167-7). [hdl](/source/Hdl_(identifier)):[10400.5/25980](https://hdl.handle.net/10400.5%2F25980). [ISSN](/source/ISSN_(identifier)) [0022-5010](https://search.worldcat.org/issn/0022-5010). [JSTOR](/source/JSTOR_(identifier)) [25650625](https://www.jstor.org/stable/25650625). [PMID](/source/PMID_(identifier)) [20481126](https://pubmed.ncbi.nlm.nih.gov/20481126).

1. **[^](#cite_ref-6)** [Visualizations that have been created with VisIt](https://wci.llnl.gov/codes/visit/gallery.html) at wci.llnl.gov. Last updated: November 8, 2007.

1. **[^](#cite_ref-7)** [Cleveland, William](/source/William_S._Cleveland) (1993). [*Visualizing data*](https://archive.org/details/visualizingdata00will). Murray Hill, N.J. Summit, N.J: At & T Bell Laboratories Published by Hobart Press. [ISBN](/source/ISBN_(identifier)) [978-0963488404](https://en.wikipedia.org/wiki/Special:BookSources/978-0963488404).

1. **[^](#cite_ref-8)** Nancy R. Tague (2004). ["Seven Basic Quality Tools"](http://www.asq.org/learn-about-quality/seven-basic-quality-tools/overview/overview.html). *The Quality Toolbox*. [Milwaukee, Wisconsin](/source/Milwaukee%2C_Wisconsin): [American Society for Quality](/source/American_Society_for_Quality). p. 15. Retrieved 2010-02-05.

1. **[^](#cite_ref-9)** ["Scatter Chart – AnyChart JavaScript Chart Documentation"](https://web.archive.org/web/20160201084227/http://docs.anychart.com/7.9.0/Basic_Charts_Types/Scatter_Chart). AnyChart. Archived from [the original](http://docs.anychart.com/7.9.0/Basic_Charts_Types/Scatter_Chart) on 1 February 2016. Retrieved 3 February 2016.

1. **[^](#cite_ref-10)** [Scatter Plot Matrix](http://www.itl.nist.gov/div898/handbook/eda/section3/scatplma.htm) at itl.nist.gov.

1. **[^](#cite_ref-11)** Emerson, John W.; Green, Walton A.; Schoerke, Barret; Crowley, Jason (2013). "The Generalized Pairs Plot". *Journal of Computational and Graphical Statistics*. **22** (1): 79–91. [doi](/source/Doi_(identifier)):[10.1080/10618600.2012.694762](https://doi.org/10.1080%2F10618600.2012.694762). [S2CID](/source/S2CID_(identifier)) [28344569](https://api.semanticscholar.org/CorpusID:28344569).

## Further reading

- Cattaneo, Matias D.; Crump, Richard K.; Farrell, Max H.; Feng, Yingjie (2024). "[On Binscatter](https://www.aeaweb.org/articles?id=10.1257/aer.20221576)". *American Economic Review*. **114** (5): 1488–1514.

## External links

- Media related to [Scatterplots](https://commons.wikimedia.org/wiki/Category:Scatterplots) at Wikimedia Commons

- [What is a scatter plot?](http://www.psychwiki.com/wiki/What_is_a_scatterplot%3F) [Archived](https://web.archive.org/web/20200807004431/http://www.psychwiki.com/wiki/What_is_a_scatterplot%3F) 2020-08-07 at the [Wayback Machine](/source/Wayback_Machine)

- [Correlation scatter-plot matrix for ordered-categorical data](https://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/) – Explanation and R code

- [Density scatter plot for large datasets](https://www.r-bloggers.com/ggplot2-for-big-data/) (hundreds of millions of points)

- [Importance of Scatter Plots](https://onlinelibrary.wiley.com/doi/10.1016/j.pmrj.2016.10.018) - essential in correlation and regression

- [Interactive scatter plot tool](https://makegraph.me/tools/scatter-plot) (MakeGraph.me)

v t e Seven basic tools of quality Cause-and-effect diagram Check sheet Control chart Histogram Pareto chart Scatter diagram Stratification Quality (business)

v t e Statistics Outline Index Descriptive statistics Continuous data Center Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode Dispersion Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance Shape Central limit theorem Moments Kurtosis L-moments Skewness Count data Index of dispersion Summary tables Contingency table Frequency distribution Grouped data Dependence Partial correlation Pearson product-moment correlation Rank correlation Kendall's τ Spearman's ρ Scatter plot Graphics Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Radar chart Run chart Scatter plot Stem-and-leaf display Violin plot Heatmap Scatter Plot Matrix ECDF plot Line chart Statistical data processing Transformations Data transformation Log transformation Power transform Box–Cox transformation Yeo–Johnson transformation Variance-stabilizing transformation Anscombe transform Fisher transformation Scaling and normalization Feature scaling Normalization Standardization (z-score) Min–max normalization Unit vector normalization Data cleaning Data cleaning Outlier Winsorizing Truncation Missing data Data reduction Dimensionality reduction Principal component analysis Factor analysis Time-series preprocessing Differencing Detrending Seasonal adjustment Stationarity transformation Data collection Study design Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power Survey methodology Sampling Cluster Stratified Opinion poll Questionnaire Standard error Controlled experiments Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control Adaptive designs Adaptive clinical trial Stochastic approximation Up-and-down designs Observational studies Cohort study Cross-sectional study Natural experiment Quasi-experiment Statistical inference Statistical theory Population Statistic Probability distribution Sampling distribution Order statistic Empirical distribution Density estimation Statistical model Model specification Lp space Parameter location scale shape Parametric family Likelihood (monotone) Location–scale family Exponential family Completeness Sufficiency Statistical functional Bootstrap U V Optimal decision loss function Efficiency Statistical distance divergence Asymptotics Robustness Frequentist inference Point estimation Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in Interval estimation Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife Testing hypotheses 1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons Parametric tests Likelihood-ratio Score/Lagrange multiplier Wald Specific tests Z-test (normal) Student's t-test F-test Goodness of fit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC Rank statistics Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test Bayesian inference Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator Correlation Regression analysis Correlation Pearson product-moment Partial correlation Confounding variable Coefficient of determination Regression analysis Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS) Template:Least squares and regression analysis Linear regression Simple linear regression Ordinary least squares General linear model Bayesian regression Non-standard predictors Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity Generalized linear model Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions Partition of variance Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom Categorical / multivariate / time-series / survival analysis Categorical Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test Cochran–Mantel–Haenszel statistics Multivariate Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal Time-series General Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality Specific tests Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey Time domain Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR)) Frequency domain Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood Survival Survival function Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time Hazard function Nelson–Aalen estimator Test Log-rank test Applications Biostatistics Bioinformatics Clinical trials / studies Epidemiology Medical statistics Engineering statistics Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification Social statistics Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics Spatial statistics Cartography Environmental statistics Geographic information system Geostatistics Kriging Category Mathematics portal Commons WikiProject

---
Adapted from the Wikipedia article [Scatter plot](https://en.wikipedia.org/wiki/Scatter_plot) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Scatter_plot?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
