# Exploratory data analysis

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Exploratory_data_analysis
> Markdown URL: https://mediated.wiki/source/Exploratory_data_analysis.md
> Source: https://en.wikipedia.org/wiki/Exploratory_data_analysis
> Source revision: 1351606903
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Approach of analyzing data sets in statistics

Part of a series on Statistics Data and information visualization Major dimensions Exploratory data analysis Information design Descriptive statistics Inferential statistics Statistical graphics Plot Data analysis Infographic Data science Important figures Tamara Munzner Ben Shneiderman John Tukey Edward Tufte Simon Wardley Hans Rosling David McCandless Kim Albrecht Alexander Osterwalder Ed Hawkins Hadley Wickham Leland Wilkinson Mike Bostock Jeffrey Heer Ihab Ilyas Information graphic types Line chart Bar chart Histogram Scatter plot Box plot Pareto chart Pie chart Area chart Tree map Bubble chart Stripe graphic Animated spiral Control chart Run chart Stem-and-leaf display Cartogram Small multiple Sparkline Table Marimekko chart Related topics Data Information Big data Database Chartjunk Visual perception Regression analysis Statistical model Misleading graph Topological data analysis v t e

In [statistics](/source/Statistics), **exploratory data analysis** (EDA) or **exploratory analytics** is an approach of [analyzing](/source/Data_analysis) [data sets](/source/Data_set) to summarize their main characteristics, often using [statistical graphics](/source/Statistical_graphics) and other [data visualization](/source/Data_visualization) methods. A [statistical model](/source/Statistical_model) can be used or not, but primarily EDA is for seeing what the data can tell beyond the formal modeling and thereby contrasts with traditional hypothesis testing, in which a model is supposed to be selected before the data is seen. Exploratory data analysis has been promoted by [John Tukey](/source/John_Tukey) since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from [initial data analysis (IDA)](/source/Data_analysis#Initial_data_analysis),[1][2] which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

## Overview

Tukey defined data analysis in 1961 as: "Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data."[3]

Exploratory data analysis is a technique to analyze and investigate a dataset and summarize its main characteristics. A main advantage of EDA is providing the visualization of data after conducting analysis.

Tukey's championing of EDA encouraged the development of [statistical computing](/source/Computational_statistics) packages, especially [S](/source/S_(programming_language)) at [Bell Labs](/source/Bell_Labs).[4] The S programming language inspired the systems [S-PLUS](/source/S-PLUS) and [R](/source/R_(programming_language)). This family of statistical-computing environments featured vastly improved dynamic visualization capabilities, which allowed statisticians to identify [outliers](/source/Outlier), [trends](/source/Trend_estimation) and [patterns](/source/Pattern_recognition) in data that merited further study.

Tukey's EDA was related to two other developments in [statistical theory](/source/Statistical_theory): [robust statistics](/source/Robust_statistics) and [nonparametric statistics](/source/Nonparametric_statistics), both of which tried to reduce the sensitivity of statistical inferences to errors in formulating [statistical models](/source/Statistical_model). Tukey promoted the use of [five number summary](/source/Five_number_summary) of numerical data—the two [extremes](/source/Extreme_value) ([maximum](/source/Maximum) and [minimum](/source/Minimum)), the [median](/source/Median), and the [quartiles](/source/Quartile)—because these median and quartiles, being functions of the [empirical distribution](/source/Empirical_distribution_function) are defined for all distributions, unlike the [mean](/source/Mean_value) and [standard deviation](/source/Standard_deviation). Moreover, the quartiles and median are more robust to [skewed](/source/Skewness) or [heavy-tailed distributions](/source/Heavy-tailed_distribution) than traditional summaries (the mean and standard deviation). The packages [S](/source/S_(programming_language)), [S-PLUS](/source/S-PLUS), and [R](/source/R_(programming_language)) included routines using [resampling statistics](/source/Resampling_(statistics)), such as Quenouille and Tukey's [jackknife](/source/Resampling_(statistics)#Jackknife) and [Efron](/source/Bradley_Efron)'s [bootstrap](/source/Bootstrapping_(statistics)), which are nonparametric and robust (for many problems).

Exploratory data analysis, robust statistics, nonparametric statistics, and the development of statistical programming languages facilitated statisticians' work on scientific and engineering problems. Such problems included the fabrication of semiconductors and the understanding of communications networks, both of which were of interest to Bell Labs. These statistical developments, all championed by Tukey, were designed to complement the [analytic](/source/Analytic_function) theory of [testing statistical hypotheses](/source/Statistical_hypothesis_testing), particularly the [Laplacian](/source/Pierre-Simon_Laplace) tradition's emphasis on [exponential families](/source/Exponential_family).[5]

Additionally, there are arguments to first visualize data during EDA before modeling in order to avoid misleading conclusions as in [Anscombe's Quartet](/source/Anscombe's_Quartet).

## Development

Data science process flowchart

[John W. Tukey](/source/John_W._Tukey) wrote the book *Exploratory Data Analysis* in 1977.[6] Tukey held that too much emphasis in statistics was placed on [statistical hypothesis testing](/source/Statistical_hypothesis_testing) (confirmatory data analysis); more emphasis needed to be placed on using [data](/source/Data) to suggest hypotheses to test. In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to [systematic bias](/source/Systematic_error) owing to the issues inherent in [testing hypotheses suggested by the data](/source/Testing_hypotheses_suggested_by_the_data).

The objectives of EDA are to:

- Enable unexpected discoveries in the data

- Suggest hypotheses about the [causes](/source/Causality) of observed [phenomena](/source/Phenomenon)

- Assess assumptions on which [statistical inference](/source/Statistical_inference) will be based

- Support the selection of appropriate statistical tools and techniques

- Provide a basis for further data collection through [surveys](/source/Survey_sampling) or [experiments](/source/Design_of_experiments)[7]

Many EDA techniques have been adopted into [data mining](/source/Data_mining). They are also being taught to young students as a way to introduce them to statistical thinking.[8]

## Techniques and tools

There are a number of tools that are useful for EDA, but EDA is characterized more by the attitude taken than by particular techniques.[9]

Typical [graphical techniques](/source/Statistical_graphics) used in EDA are:

- [Box plot](/source/Box_plot)

- [Histogram](/source/Histogram)

- [Multi-vari chart](/source/Multi-vari_chart)

- [Run chart](/source/Run_chart)

- [Pareto chart](/source/Pareto_chart)

- [Scatter plot](/source/Scatter_plot) (2D/3D)

- [Stem-and-leaf plot](/source/Stemplot)

- [Parallel coordinates](/source/Parallel_coordinates)

- [Odds ratio](/source/Odds_ratio)

- [Targeted projection pursuit](/source/Targeted_projection_pursuit)

- [Heat map](/source/Heat_map)

- [Bar chart](/source/Bar_chart)

- Horizon graph

- Glyph-based visualization methods such as PhenoPlot[10] and [Chernoff faces](/source/Chernoff_face)

- Projection methods such as grand tour, guided tour and manual tour

- Interactive versions of these plots

[Dimensionality reduction](/source/Dimensionality_reduction):

- [Multidimensional scaling](/source/Multidimensional_scaling)

- [Principal component analysis](/source/Principal_component_analysis) (PCA)

- [Multilinear PCA](/source/Multilinear_principal_component_analysis)

- [Nonlinear dimensionality reduction](/source/Nonlinear_dimensionality_reduction) (NLDR)

- [Iconography of correlations](/source/Iconography_of_correlations)

Typical [quantitative](/source/Quantity) techniques are:

- [Median polish](/source/Median_polish)

- [Trimean](/source/Trimean)

- [Ordination](/source/Ordination_(statistics))

## History

Many EDA ideas can be traced back to earlier authors, for example:

- [Francis Galton](/source/Francis_Galton) emphasized [order statistics](/source/Order_statistic) and [quantiles](/source/Quantile).

- [Arthur Lyon Bowley](/source/Arthur_Lyon_Bowley) used precursors of the stemplot and [five-number summary](/source/Five-number_summary) (Bowley actually used a "[seven-figure summary](/source/Seven-number_summary)", including the extremes, [deciles](/source/Decile) and [quartiles](/source/Quartile), along with the median—see his *Elementary Manual of Statistics* (3rd edn., 1920), p. 62[11]– he defines "the maximum and minimum, median, quartiles and two deciles" as the "seven positions").

- [Andrew Ehrenberg](/source/Andrew_S._C._Ehrenberg) articulated a philosophy of [data reduction](/source/Data_reduction) (see his book of the same name).

The [Open University](/source/Open_University) course *Statistics in Society* (MDST 242), took the above ideas and merged them with [Gottfried Noether](/source/Gottfried_Noether)'s work, which introduced [statistical inference](/source/Statistical_inference) via coin-tossing and the [median test](/source/Median_test).

## Example

Findings from EDA are orthogonal to the primary analysis task. To illustrate, consider an example from Cook et al. where the analysis task is to find the variables which best predict the tip that a dining party will give to the waiter.[12] The variables available in the data collected for this task are: the tip amount, total bill, payer gender, smoking/non-smoking section, time of day, day of the week, and size of the party. The primary analysis task is approached by fitting a regression model where the tip rate is the response variable. The fitted model is

- ([tip rate](https://en.wikipedia.org/w/index.php?title=Tip_rate&action=edit&redlink=1)) = 0.18 - 0.01 × (party size)

which says that as the size of the dining party increases by one person (leading to a higher bill), the tip rate will decrease by 1%, on average.

However, exploring the data reveals other interesting features not described by this model.

		- Histogram of tip amounts where the bins cover $1 increments. The distribution of values is skewed right and unimodal, as is common in distributions of small, non-negative quantities.

		- Histogram of tip amounts where the bins cover $0.10 increments. An interesting phenomenon is visible: peaks occur at the whole-dollar and half-dollar amounts, which is caused by customers picking round numbers as tips. This behavior is common to other types of purchases too, like gasoline.

		- Scatterplot of tips vs. bill. Points below the line correspond to tips that are lower than expected (for that bill amount), and points above the line are higher than expected. We might expect to see a tight, positive linear association, but instead see [variation that increases with tip amount](/source/Heteroscedasticity). In particular, there are more points far away from the line in the lower right than in the upper left, indicating that more customers are very cheap than very generous.

		- Scatterplot of tips vs. bill separated by payer gender and smoking section status. Smoking parties have a lot more variability in the tips that they give. Males tend to pay the (few) higher bills, and the female non-smokers tend to be very consistent tippers (with three conspicuous exceptions shown in the sample).

What is learned from the plots is different from what is illustrated by the regression model, even though the experiment was not designed to investigate any of these other trends. The patterns found by exploring the data suggest hypotheses about tipping that may not have been anticipated in advance, and which could lead to interesting follow-up experiments where the hypotheses are formally stated and tested by collecting new data.

## Software

- [JMP](/source/JMP_(statistical_software)), an EDA package from [SAS Institute](/source/SAS_Institute).

- [KNIME](/source/KNIME), Konstanz Information Miner – Open-Source data exploration platform based on Eclipse.

- [Minitab](/source/Minitab), an EDA and general statistics package widely used in industrial and corporate settings.

- [Orange](/source/Orange_(software)), an [open-source](/source/Open-source_software) [data mining](/source/Data_mining) and [machine learning](/source/Machine_learning) software suite.

- [Python](/source/Python_(programming_language)), an open-source programming language widely used in data mining and machine learning.

- Matplotlib & Seaborn are the Python libraries used in today's world for EDA and Plotting/Data Visualization. (point updated: 2025)

- [R](/source/R_(programming_language)), an open-source programming language for statistical computing and graphics. Together with Python one of the most popular languages for data science.

- [TinkerPlots](/source/TinkerPlots) an EDA software for upper elementary and middle school students.

- [Weka](/source/Weka_(machine_learning)) an open source data mining package that includes visualization and EDA tools such as [targeted projection pursuit](/source/Targeted_projection_pursuit).

## See also

- [Anscombe's quartet](/source/Anscombe's_quartet), on importance of exploration

- [Data dredging](/source/Data_dredging)

- [Predictive analytics](/source/Predictive_analytics)

- [Structured data analysis (statistics)](/source/Structured_data_analysis_(statistics))

- [Configural frequency analysis](/source/Configural_frequency_analysis)

- [Descriptive statistics](/source/Descriptive_statistics)

## References

1. **[^](#cite_ref-1)** Chatfield, C. (1995). *Problem Solving: A Statistician's Guide* (2nd ed.). Chapman and Hall. [ISBN](/source/ISBN_(identifier)) [978-0-412-60630-4](https://en.wikipedia.org/wiki/Special:BookSources/978-0-412-60630-4).

1. **[^](#cite_ref-2)** Baillie, Mark; Le Cessie, Saskia; Schmidt, Carsten Oliver; Lusa, Lara; Huebner, Marianne; Topic Group "Initial Data Analysis" of the STRATOS Initiative (2022). ["Ten simple rules for initial data analysis"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8870512). *PLOS Computational Biology*. **18** (2) e1009819. [Bibcode](/source/Bibcode_(identifier)):[2022PLSCB..18E9819B](https://ui.adsabs.harvard.edu/abs/2022PLSCB..18E9819B). [doi](/source/Doi_(identifier)):[10.1371/journal.pcbi.1009819](https://doi.org/10.1371%2Fjournal.pcbi.1009819). [PMC](/source/PMC_(identifier)) [8870512](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8870512). [PMID](/source/PMID_(identifier)) [35202399](https://pubmed.ncbi.nlm.nih.gov/35202399).

1. **[^](#cite_ref-3)** [John Tukey-The Future of Data Analysis-July 1961](http://projecteuclid.org/download/pdf_1/euclid.aoms/1177704711)

1. **[^](#cite_ref-4)** Becker, Richard A., [*A Brief History of S*](https://web.archive.org/web/20150723044213/http://www2.research.att.com/areas/stat/doc/94.11.ps), Murray Hill, New Jersey: AT&T Bell Laboratories, archived from [the original](http://www2.research.att.com/areas/stat/doc/94.11.ps) (PS) on 2015-07-23, retrieved 2015-07-23, ... we wanted to be able to interact with our data, using Exploratory Data Analysis (Tukey, 1971) techniques.

1. **[^](#cite_ref-5)** Morgenthaler, Stephan; Fernholz, Luisa T. (2000). ["Conversation with John W. Tukey and Elizabeth Tukey, Luisa T. Fernholz and Stephan Morgenthaler"](https://doi.org/10.1214%2Fss%2F1009212675). *Statistical Science*. **15** (1): 79–94. [doi](/source/Doi_(identifier)):[10.1214/ss/1009212675](https://doi.org/10.1214%2Fss%2F1009212675).

1. **[^](#cite_ref-Tukey1977_6-0)** Tukey, John W. (1977). [*Exploratory Data Analysis*](/source/Exploratory_Data_Analysis). Pearson. [ISBN](/source/ISBN_(identifier)) [978-0-201-07616-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-201-07616-5).

1. **[^](#cite_ref-7)** [Behrens-Principles and Procedures of Exploratory Data Analysis-American Psychological Association-1997](https://web.archive.org/web/20170808064326/cll.stanford.edu/~willb/course/behrens97pm.pdf)

1. **[^](#cite_ref-8)** Konold, C. (1999). "Statistics goes to school". *Contemporary Psychology*. **44** (1): 81–82. [doi](/source/Doi_(identifier)):[10.1037/001949](https://doi.org/10.1037%2F001949).

1. **[^](#cite_ref-9)** Tukey, John W. (1980). "We need both exploratory and confirmatory". *The American Statistician*. **34** (1): 23–25. [doi](/source/Doi_(identifier)):[10.1080/00031305.1980.10482706](https://doi.org/10.1080%2F00031305.1980.10482706).

1. **[^](#cite_ref-10)** Sailem, Heba Z.; Sero, Julia E.; Bakal, Chris (2015-01-08). ["Visualizing cellular imaging data using PhenoPlot"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4354266). *Nature Communications*. **6** (1): 5825. [Bibcode](/source/Bibcode_(identifier)):[2015NatCo...6.5825S](https://ui.adsabs.harvard.edu/abs/2015NatCo...6.5825S). [doi](/source/Doi_(identifier)):[10.1038/ncomms6825](https://doi.org/10.1038%2Fncomms6825). [ISSN](/source/ISSN_(identifier)) [2041-1723](https://search.worldcat.org/issn/2041-1723). [PMC](/source/PMC_(identifier)) [4354266](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4354266). [PMID](/source/PMID_(identifier)) [25569359](https://pubmed.ncbi.nlm.nih.gov/25569359).

1. **[^](#cite_ref-11)** Elementary Manual of Statistics (3rd edn., 1920)[https://archive.org/details/cu31924013702968/page/n5](https://archive.org/details/cu31924013702968/page/n5)

1. **[^](#cite_ref-12)** [Cook, D.](/source/Dianne_Cook_(statistician)) and [Swayne, D.F.](/source/Deborah_F._Swayne) (with A. Buja, D. Temple Lang, H. Hofmann, H. Wickham, M. Lawrence) (2007) "Interactive and Dynamic Graphics for Data Analysis: With R and GGobi" Springer, 978-0387717616

## Bibliography

- [Andrienko, N](/source/Natalia_Andrienko) & Andrienko, G (2005) *Exploratory Analysis of Spatial and Temporal Data. A Systematic Approach*. Springer. [ISBN](/source/ISBN_(identifier)) [3-540-25994-5](https://en.wikipedia.org/wiki/Special:BookSources/3-540-25994-5)

- [Cook, D.](/source/Dianne_Cook_(statistician)) and [Swayne, D.F.](/source/Deborah_F._Swayne) (with A. Buja, D. Temple Lang, H. Hofmann, H. Wickham, M. Lawrence) (2007-12-12). *Interactive and Dynamic Graphics for Data Analysis: With R and GGobi*. Springer. [ISBN](/source/ISBN_(identifier)) [978-0-387-71761-6](https://en.wikipedia.org/wiki/Special:BookSources/978-0-387-71761-6).{{[cite book](https://en.wikipedia.org/wiki/Template:Cite_book)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

- Cook, D. and Swayne, D.F. (with A. Buja, D. Temple Lang, H. Hofmann, H. Wickham, M. Lawrence) (2007-12-12). Interactive and Dynamic Graphics for Data Analysis: With R and GGobi. Springer. ISBN 9780387717616.

- Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). Exploring Data Tables, Trends and Shapes. ISBN 978-0-471-09776-1.

- Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). Understanding Robust and Exploratory Data Analysis. ISBN 978-0-471-09777-8.

- Young, F. W. Valero-Mora, P. and Friendly M. (2006) Visual Statistics: Seeing your data with Dynamic Interactive Graphics. Wiley ISBN 978-0-471-68160-1 Jambu M. (1991) Exploratory and Multivariate Data Analysis. Academic Press ISBN 0123800900

- S. H. C. DuToit, A. G. W. Steyn, R. H. Stumpf (1986) Graphical Exploratory Data Analysis. Springer ISBN 978-1-4612-9371-2

- Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). [*Exploring Data Tables, Trends and Shapes*](https://archive.org/details/exploringdatatab0000unse). Wiley. [ISBN](/source/ISBN_(identifier)) [978-0-471-09776-1](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-09776-1).{{[cite book](https://en.wikipedia.org/wiki/Template:Cite_book)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

- Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). *Understanding Robust and Exploratory Data Analysis*. Wiley. [ISBN](/source/ISBN_(identifier)) [978-0-471-09777-8](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-09777-8).{{[cite book](https://en.wikipedia.org/wiki/Template:Cite_book)}}: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list))

- Inselberg, Alfred (2009). *Parallel Coordinates:Visual Multidimensional Geometry and its Applications*. London New York: Springer. [ISBN](/source/ISBN_(identifier)) [978-0-387-68628-8](https://en.wikipedia.org/wiki/Special:BookSources/978-0-387-68628-8).

- Leinhardt, G., Leinhardt, S., *[Exploratory Data Analysis: New Tools for the Analysis of Empirical Data](https://journals.sagepub.com/doi/pdf/10.3102/0091732X008001085)*, Review of Research in Education, Vol. 8, 1980 (1980), pp. 85–157.

- [Martinez, W. L.](/source/Wendy_L._Martinez); Martinez, A. R. & Solka, J. (2010). *Exploratory Data Analysis with MATLAB, second edition*. Chapman & Hall/CRC. [ISBN](/source/ISBN_(identifier)) [978-1-4398-1220-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4398-1220-4).

- Theus, M., Urbanek, S. (2008), Interactive Graphics for Data Analysis: Principles and Examples, CRC Press, Boca Raton, FL, [ISBN](/source/ISBN_(identifier)) [978-1-58488-594-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-58488-594-8)

- Tucker, L; MacCallum, R. (1993). [*Exploratory Factor Analysis*](http://www.unc.edu/~rcm/book/factornew.htm).

- Tukey, John Wilder (1977). [*Exploratory Data Analysis*](https://archive.org/details/exploratorydataa00tuke_0). Addison-Wesley. [ISBN](/source/ISBN_(identifier)) [978-0-201-07616-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-201-07616-5).

- Velleman, P. F.; Hoaglin, D. C. (1981). [*Applications, Basics and Computing of Exploratory Data Analysis*](https://archive.org/details/applicationsbasi00vell). Duxbury Press. [ISBN](/source/ISBN_(identifier)) [978-0-87150-409-8](https://en.wikipedia.org/wiki/Special:BookSources/978-0-87150-409-8).

- Young, F. W. Valero-Mora, P. and Friendly M. (2006) [*Visual Statistics: Seeing your data with Dynamic Interactive Graphics*](http://www.uv.es/visualstats/Book). Wiley [ISBN](/source/ISBN_(identifier)) [978-0-471-68160-1](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-68160-1)

- Jambu M. (1991) [*Exploratory and Multivariate Data Analysis*](http://www.sciencedirect.com/science/book/9780123800909). Academic Press [ISBN](/source/ISBN_(identifier)) [0123800900](https://en.wikipedia.org/wiki/Special:BookSources/0123800900)

- S. H. C. DuToit, A. G. W. Steyn, R. H. Stumpf (1986) [*Graphical Exploratory Data Analysis*](https://link.springer.com/book/10.1007%2F978-1-4612-4950-4). Springer [ISBN](/source/ISBN_(identifier)) [978-1-4612-9371-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4612-9371-2)

## External links

- [Carnegie Mellon University – free online course on Probability and Statistics, with a module on EDA](https://oli.cmu.edu/courses/free-open/statistics-course-details/)

- [• Exploratory data analysis chapter: engineering statistics handbook](https://www.itl.nist.gov/div898/handbook/eda/eda.htm)

v t e Social survey research Data collection Collection methods Questionnaire Interview Structured Semi-structured Unstructured Couple Methodology Census Sampling frame Statistical sample Sampling for surveys Random sampling Simple random sampling Quota sampling Stratified sampling Nonprobability sampling Sample size determination Research design Panel study Cohort study Cross-sectional study Cross-sequential study Survey errors Sampling error Standard error Sampling bias Systematic errors Non-sampling error Specification error Frame error Measurement error Response errors Non-response bias Coverage error Pseudo-opinion Processing errors Data analysis Categorical data Contingency table Level of measurement Descriptive statistics Exploratory data analysis Multivariate statistics Psychometrics Statistical inference Statistical models Graphical Log-linear Structural Applications Audience measurement Demography Market research Opinion poll Public opinion Major surveys List of comparative social surveys Afrobarometer American National Election Studies Asian Barometer Survey Comparative Study of Electoral Systems Emerson College Polling Eurobarometer European Social Survey Gallup Poll General Social Survey Household, Income and Labour Dynamics in Australia Survey International Social Survey Latinobarómetro List of household surveys in the United States National Health and Nutrition Examination Survey New Zealand Attitudes and Values Study Suffolk University Political Research Center The Phillips Academy Poll Quinnipiac University Polling Institute World Values Survey Associations American Association for Public Opinion Research European Society for Opinion and Marketing Research International Statistical Institute Pew Research Center World Association for Public Opinion Research Category Projects Business Politics Psychology Sociology Statistics

Authority control databases International GND Other Yale LUX

---
Adapted from the Wikipedia article [Exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Exploratory_data_analysis?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.