Freedman–Diaconis Rule
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Freedman–Diaconis rule can be used to select the width of the bins to be used in a
histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...
. It is named after David A. Freedman and
Persi Diaconis Persi Warren Diaconis (; born January 31, 1945) is an American mathematician of Greek descent and former professional magician. He is the Mary V. Sunseri Professor of Statistics and Mathematics at Stanford University. He is particularly known f ...
. For a set of empirical measurements sampled from some
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
, the Freedman-Diaconis rule is designed roughly to minimize the integral of the squared difference between the
histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...
(i.e., relative frequency density) and the density of the theoretical probability distribution. The general equation for the rule is: :\text=2\, where \operatorname(x) is the
interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...
of the data and n is the number of observations in the sample x.


Other approaches

With the factor 2 replaced by approximately 2.59, the Freedman-Diaconis rule asymptotically matches ''Scott's normal reference rule'' for data sampled from a normal distribution. Another approach is to use ''Sturges' rule'': use a bin so large that there are about 1+\log_2n non-empty bins (Scott, 2009). This works well for ''n'' under 200, but was found to be inaccurate for large ''n''. For a discussion and an alternative approach, see Birgé and Rozenholc.


References

Rules of thumb Statistical charts and diagrams Infographics {{statistics-stub