# Frequency (statistics)

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Frequency_(statistics)
> Markdown URL: https://mediated.wiki/source/Frequency_(statistics).md
> Source: https://en.wikipedia.org/wiki/Frequency_(statistics)
> Source revision: 1347825996
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Number of occurrences in an experiment or study

For other uses, see [Frequency (disambiguation)](/source/Frequency_(disambiguation)).

In [statistics](/source/Statistics), the **frequency** or **absolute frequency** of an [event](/source/Event_(probability_theory)) i {\displaystyle i} is the number n i {\displaystyle n_{i}} of times the observation has occurred or been recorded in an [experiment](/source/Experiment) or study.[1]: 12–19 The **relative frequency** is the ratio of absolute frequency to the [sample size](/source/Sample_size). The **cumulative frequency** is the total of the absolute frequencies of all events at or below a certain point in an ordered list of events.[1]: 17–19 These frequencies are often depicted graphically or in tabular form. They may be used as [estimators](/source/Estimator) of [empirical probabilities](/source/Empirical_probability) or [cumulative distribution functions](/source/Cumulative_distribution_function), for instance.

## Formulation

The relative frequency of an event is the absolute frequency [normalized](/source/Normalizing_constant) by the total number of events:

- f i = n i N = n i ∑ j n j . {\displaystyle f_{i}={\frac {n_{i}}{N}}={\frac {n_{i}}{\sum _{j}n_{j}}}.}

The values of f i {\displaystyle f_{i}} for all events i {\displaystyle i} can be plotted to produce a frequency distribution.

In the case when n i = 0 {\displaystyle n_{i}=0} for certain i {\displaystyle i} , [pseudocounts](/source/Pseudocount) can be added.

## Visualization

[Histogram](/source/Histogram) of travel time (to work), US 2000 census

[Bar chart](/source/Bar_chart), with 'Country' as the [categorical variable](/source/Categorical_variable) for the discrete data set

Horizontal [3D](/source/Three-dimensional_space) bar chart

Pie chart of world population by country

Different ways of depicting frequency distributions

A **frequency distribution** shows a summarized grouping of data divided into mutually exclusive classes and the number of occurrences in a class. It is a way of showing unorganized data notably to show results of an election, income of people for a certain region, sales of a product within a certain period, student loan amounts of graduates, etc. Some of the graphs that can be used with frequency distributions are [histograms](/source/Histogram), [line charts](/source/Line_chart), [bar charts](/source/Bar_chart) and [pie charts](/source/Pie_chart). Frequency distributions are used for both qualitative and quantitative data.

### Construction

1. Decide the number of classes. Too many classes or too few classes might not reveal the basic shape of the data set, also it will be difficult to interpret such frequency distribution. The ideal number of classes may be determined or estimated by formula: number of classes = C = 1 + 3.3 log ⁡ n {\displaystyle {\text{number of classes}}=C=1+3.3\log n} (log base 10), or by the [square-root choice](/source/Histogram#Square-root_choice) formula C = n {\displaystyle C={\sqrt {n}}} where *n* is the total number of observations in the data. (The latter will be much too large for large data sets such as population statistics.) However, these formulas are not a hard rule and the resulting number of classes determined by formula may not always be exactly suitable with the data being dealt with.

1. Calculate the range of the data (Range = Max – Min) by finding the minimum and maximum data values. Range will be used to determine the class interval or class width.

1. Decide the width of the classes, denoted by *h* and obtained by h = range number of classes {\displaystyle h={\frac {\text{range}}{\text{number of classes}}}} (assuming the class intervals are the same for all classes).

Generally the class interval or class width is the same for all classes. The classes all taken together must cover at least the distance from the lowest value (minimum) in the data to the highest (maximum) value. Equal class intervals are preferred in frequency distribution, while unequal class intervals (for example logarithmic intervals) may be necessary in certain situations to produce a good spread of observations between the classes and avoid a large number of empty, or almost empty classes.[2]

1. Decide the individual class limits and select a suitable starting point of the first class which is arbitrary; it may be less than or equal to the minimum value. Usually it is started before the minimum value in such a way that the midpoint (the average of lower and upper class limits of the first class) is properly[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify)*] placed.

1. Take an observation and mark a vertical bar (|) for a class it belongs. A running tally is kept till the last observation.

1. Find the frequencies, relative frequency, cumulative frequency etc. as required.

The following are some commonly used methods of depicting frequency:[3]

### Histograms

Main article: [Histogram](/source/Histogram)

A histogram is a representation of tabulated frequencies, shown as adjacent [rectangles](/source/Rectangle) or [squares](/source/Square) (in some of situations), erected over discrete intervals (bins), with an area proportional to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be [normalized](/source/Normalization_(statistics)) displaying relative frequencies. It then shows the proportion of cases that fall into each of several [categories](/source/Categorization), with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping [intervals](/source/Interval_(mathematics)) of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size.[4] The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous.[5]

### Bar graphs

A **bar chart** or **bar graph** is a [chart](/source/Chart) with [rectangular](/source/Rectangle) bars with [lengths](/source/Length) proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column bar chart.

### Frequency distribution table

A [frequency distribution](/source/Frequency_distribution) table is an arrangement of the values that one or more variables take in a [sample](/source/Sampling_(statistics)). Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the [distribution](/source/Statistical_distribution) of values in the sample.

This is an example of a univariate (=single [variable](/source/Variable_(mathematics))) frequency table. The frequency of each response to a survey question is depicted.

Rank Degree of agreement Number 1 Strongly agree 22 2 Agree somewhat 30 3 Not sure 20 4 Disagree somewhat 15 5 Strongly disagree 15

A different tabulation scheme aggregates values into bins such that each bin encompasses a range of values. For example, the heights of the students in a class could be organized into the following frequency table.

Height range Number of students Cumulative number less than 5.0 feet 25 25 5.0–5.5 feet 35 60 5.5–6.0 feet 20 80 6.0–6.5 feet 20 100

### Joint frequency distributions

Bivariate joint frequency distributions are often presented as (two-way) [contingency tables](/source/Contingency_tables):

Two-way contingency table with marginal frequencies Dance Sports TV Total Men 2 10 8 20 Women 16 6 8 30 Total 18 16 16 50

The total row and total column report the marginal frequencies or [marginal distribution](/source/Marginal_distribution), while the body of the table reports the joint frequencies.[6]

## Interpretation

Under the [frequency interpretation](/source/Frequentist_probability) of [probability](/source/Probability), it is assumed that the source is [ergodic](/source/Ergodicity), i.e., as the length of a series of trials increases without bound, the fraction of experiments in which a given event occurs will approach a fixed value, known as the **limiting relative frequency**.[7][8]

This interpretation is often contrasted with [Bayesian probability](/source/Bayesian_probability).

The term *frequentist* was first used by [M. G. Kendall](/source/Maurice_Kendall) in 1949, to contrast with [Bayesians](/source/Bayesian_probability), whom he called "non-frequentists".[9][10] He observed

- 3....we may broadly distinguish two main attitudes. One takes probability as 'a degree of rational belief', or some similar idea...the second defines probability in terms of frequencies of occurrence of events, or by relative proportions in 'populations' or 'collectives'; (p. 101)

- ...

- 12. It might be thought that the differences between the frequentists and the non-frequentists (if I may call them such) are largely due to the differences of the domains which they purport to cover. (p. 104)

- ...

- *I assert that this is not so* ... The essential distinction between the frequentists and the non-frequentists is, I think, that the former, in an effort to avoid anything savouring of matters of opinion, seek to define probability in terms of the objective properties of a population, real or hypothetical, whereas the latter do not. [emphasis in original]

## Applications

Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple algorithms to calculate median, mean, standard deviation etc. from these tables.

[Statistical hypothesis testing](/source/Statistical_hypothesis_testing) is founded on the assessment of differences and similarities between frequency distributions. This assessment involves measures of [central tendency](/source/Measures_of_central_tendency) or [averages](/source/Average), such as the [mean](/source/Mean) and [median](/source/Median), and measures of variability or [statistical dispersion](/source/Statistical_dispersion), such as the [standard deviation](/source/Standard_deviation) or [variance](/source/Variance).

A frequency distribution is said to be [skewed](/source/Skewness) when its mean and median are significantly different, or more generally when it is [asymmetric](/source/Symmetric_distribution). The [kurtosis](/source/Kurtosis) of a frequency distribution is a measure of the proportion of extreme values (outliers), which appear at either end of the [histogram](/source/Histogram). If the distribution is more outlier-prone than the [normal distribution](/source/Normal_distribution) it is said to be leptokurtic; if less outlier-prone it is said to be platykurtic.

[Letter frequency](/source/Letter_frequency) distributions are also used in [frequency analysis](/source/Frequency_analysis_(cryptanalysis)) to crack [ciphers](/source/Cipher), and are used to compare the relative frequencies of letters in different languages and other languages are often used like Greek, Latin, etc.

## See also

- [Mathematics portal](https://en.wikipedia.org/wiki/Portal:Mathematics)

- [Aperiodic frequency](/source/Aperiodic_frequency)

- [Count data](/source/Count_data)

- [Cross tabulation](/source/Cross_tabulation)

- [Cumulative distribution function](/source/Cumulative_distribution_function)

- [Cumulative frequency analysis](/source/Cumulative_frequency_analysis)

- [Empirical distribution function](/source/Empirical_distribution_function)

- [Law of large numbers](/source/Law_of_large_numbers)

- [Multiset *multiplicity*](/source/Multiset), analogous to frequency in multiset theory

- [Probability density function](/source/Probability_density_function)

- [Probability interpretations](/source/Probability_interpretations)

- [Statistical regularity](/source/Statistical_regularity)

- [Word frequency](/source/Word_frequency)

## References

1. ^ [***a***](#cite_ref-Kenney_1-0) [***b***](#cite_ref-Kenney_1-1) Kenney, J. F.; Keeping, E. S. (1962). [*Mathematics of Statistics, Part 1*](https://books.google.com/books?id=UdlLAAAAMAAJ) (3rd ed.). Princeton, NJ: [Van Nostrand Reinhold](/source/John_Wiley_%26_Sons).

1. **[^](#cite_ref-2)** Manikandan, S (1 January 2011). ["Frequency distribution"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117575). *Journal of Pharmacology & Pharmacotherapeutics*. **2** (1): 54–55. [doi](/source/Doi_(identifier)):[10.4103/0976-500X.77120](https://doi.org/10.4103%2F0976-500X.77120). [ISSN](/source/ISSN_(identifier)) [0976-500X](https://search.worldcat.org/issn/0976-500X). [PMC](/source/PMC_(identifier)) [3117575](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117575). [PMID](/source/PMID_(identifier)) [21701652](https://pubmed.ncbi.nlm.nih.gov/21701652).

1. **[^](#cite_ref-3)** Carlson, K. and Winquist, J. (2014) *An Introduction to Statistics*. SAGE Publications, Inc. Chapter 1: Introduction to Statistics and Frequency Distributions

1. **[^](#cite_ref-4)** Howitt, D. and Cramer, D. (2008) *Statistics in Psychology*. Prentice Hall

1. **[^](#cite_ref-5)** Charles Stangor (2011) "Research Methods For The Behavioral Sciences". Wadsworth, Cengage Learning. [ISBN](/source/ISBN_(identifier)) [9780840031976](https://en.wikipedia.org/wiki/Special:BookSources/9780840031976).

1. **[^](#cite_ref-6)** Stat Trek, Statistics and Probability Glossary, *s.v.* [Joint frequency](http://stattrek.com/statistics/dictionary.aspx?definition=Joint_frequency)

1. **[^](#cite_ref-Mises_7-0)** von Mises, Richard (1939) *Probability, Statistics, and Truth* (in German) (English translation, 1981: Dover Publications; 2 Revised edition. [ISBN](/source/ISBN_(identifier)) [0486242145](https://en.wikipedia.org/wiki/Special:BookSources/0486242145)) (p.14)

1. **[^](#cite_ref-Gilles_8-0)** *The frequency theory* Chapter 5; in Donald Gilles, *Philosophical theories of probability* (2000), Psychology Press. [ISBN](/source/ISBN_(identifier)) [9780415182751](https://en.wikipedia.org/wiki/Special:BookSources/9780415182751), p. 88.

1. **[^](#cite_ref-9)** [Earliest Known Uses of Some of the Words of Probability & Statistics](http://www.leidenuniv.nl/fsw/verduin/stathist/1stword.htm)

1. **[^](#cite_ref-10)** [Kendall, Maurice George](/source/Maurice_Kendall) (1949). "On the Reconciliation of Theories of Probability". *Biometrika*. **36** (1/2). Biometrika Trust: 101–116. [doi](/source/Doi_(identifier)):[10.1093/biomet/36.1-2.101](https://doi.org/10.1093%2Fbiomet%2F36.1-2.101). [JSTOR](/source/JSTOR_(identifier)) [2332534](https://www.jstor.org/stable/2332534).

v t e Statistics Outline Index Descriptive statistics Continuous data Center Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode Dispersion Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance Shape Central limit theorem Moments Kurtosis L-moments Skewness Count data Index of dispersion Summary tables Contingency table Frequency distribution Grouped data Dependence Partial correlation Pearson product-moment correlation Rank correlation Kendall's τ Spearman's ρ Scatter plot Graphics Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Radar chart Run chart Scatter plot Stem-and-leaf display Violin plot Heatmap Scatter Plot Matrix ECDF plot Line chart Statistical data processing Transformations Data transformation Log transformation Power transform Box–Cox transformation Yeo–Johnson transformation Variance-stabilizing transformation Anscombe transform Fisher transformation Scaling and normalization Feature scaling Normalization Standardization (z-score) Min–max normalization Unit vector normalization Data cleaning Data cleaning Outlier Winsorizing Truncation Missing data Data reduction Dimensionality reduction Principal component analysis Factor analysis Time-series preprocessing Differencing Detrending Seasonal adjustment Stationarity transformation Data collection Study design Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power Survey methodology Sampling Cluster Stratified Opinion poll Questionnaire Standard error Controlled experiments Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control Adaptive designs Adaptive clinical trial Stochastic approximation Up-and-down designs Observational studies Cohort study Cross-sectional study Natural experiment Quasi-experiment Statistical inference Statistical theory Population Statistic Probability distribution Sampling distribution Order statistic Empirical distribution Density estimation Statistical model Model specification Lp space Parameter location scale shape Parametric family Likelihood (monotone) Location–scale family Exponential family Completeness Sufficiency Statistical functional Bootstrap U V Optimal decision loss function Efficiency Statistical distance divergence Asymptotics Robustness Frequentist inference Point estimation Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in Interval estimation Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife Testing hypotheses 1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons Parametric tests Likelihood-ratio Score/Lagrange multiplier Wald Specific tests Z-test (normal) Student's t-test F-test Goodness of fit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC Rank statistics Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test Bayesian inference Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator Correlation Regression analysis Correlation Pearson product-moment Partial correlation Confounding variable Coefficient of determination Regression analysis Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS) Template:Least squares and regression analysis Linear regression Simple linear regression Ordinary least squares General linear model Bayesian regression Non-standard predictors Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity Generalized linear model Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions Partition of variance Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom Categorical / multivariate / time-series / survival analysis Categorical Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test Cochran–Mantel–Haenszel statistics Multivariate Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal Time-series General Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality Specific tests Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey Time domain Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR)) Frequency domain Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood Survival Survival function Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time Hazard function Nelson–Aalen estimator Test Log-rank test Applications Biostatistics Bioinformatics Clinical trials / studies Epidemiology Medical statistics Engineering statistics Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification Social statistics Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics Spatial statistics Cartography Environmental statistics Geographic information system Geostatistics Kriging Category Mathematics portal Commons WikiProject

---
Adapted from the Wikipedia article [Frequency (statistics)](https://en.wikipedia.org/wiki/Frequency_(statistics)) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Frequency_(statistics)?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.