Seven-number Summary
In descriptive statistics, the seven-number summary is a collection of seven summary statistics, and is an extension of the five-number summary. There are three similar, common forms. As with the five-number summary, it can be represented by a modified box plot, adding hatch-marks on the "whiskers" for two of the additional numbers. Seven-number summary The following percentiles are (approximately) evenly spaced under a normally distributed variable: # the 2nd percentile (better: 2.15%) # the 9th percentile (better: 8.87%) # the 25th percentile or lower quartile or ''first quartile'' # the 50th percentile or median (middle value, or ''second quartile'') # the 75th percentile or upper quartile or ''third quartile'' # the 91st percentile (better: 91.13%) # the 98th percentile (better: 97.85%) The middle three values – the lower quartile, median, and upper quartile – are the usual statistics from the five-number summary and are the standard values for the box in a box ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Descriptive Statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and analysing those statistics. Descriptive statistic is distinguished from inferential statistics (or inductive statistics) by its aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory, and are frequently nonparametric statistics. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment o ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Sample Minimum
In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistics such as the five-number summary and Bowley's seven-figure summary and the associated box plot. The minimum and the maximum value are the first and last order statistics (often denoted ''X''(1) and ''X''(''n'') respectively, for a sample size of ''n''). If the sample has outliers, they necessarily include the sample maximum or sample minimum, or both, depending on whether they are extremely high or low. However, the sample maximum and minimum need not be outliers, if they are not unusually far from other observations. Robustness The sample maximum and minimum are the ''least'' robust statistics: they are maximally sensitive to outliers. This can either be an advantage or a drawback: if extreme values are real (not measurement errors), ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Three-point Estimation
The three-point estimation technique is used in management and information systems applications for the construction of an approximate probability distribution representing the outcome of future events, based on very limited information. While the distribution used for the approximation might be a normal distribution, this is not always so. For example, a triangular distribution might be used, depending on the application. In three-point estimation, three figures are produced initially for every distribution that is required, based on prior experience or best-guesses: * ''a'' = the best-case estimate * ''m'' = the most likely estimate * ''b'' = the worst-case estimate These are then combined to yield either a full probability distribution, for later combination with distributions obtained similarly for other variables, or summary descriptors of the distribution, such as the mean, standard deviation or percentage points of the distribution. The accuracy attributed to the results deriv ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distribution, the Tukey test of additivity, and the Teichmüller–Tukey lemma all bear his name. He is also credited with coining the term ' bit' and the first published use of the word 'software'. Biography Tukey was born in New Bedford, Massachusetts in 1915, to a Latin teacher father and a private tutor. He was mainly taught by his mother and attended regular classes only for certain subjects like French. Tukey obtained a BA in 1936 and MSc in 1937 in chemistry, from Brown University, before moving to Princeton University, where in 1939 he received a PhD in mathematics after completing a doctoral dissertation titled "On denumerability in topology". During World War II, Tukey worked at the Fire Control Research Office and collaborated with ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal distribution, negative skew commonly indicates that the ''tail'' is on the left side of the distribution, and positive skew indicates that the tail is on the right. In cases where one tail is long but the other tail is fat, skewness does not obey a simple rule. For example, a zero value means that the tails on both sides of the mean balance out overall; this is the case for a symmetric distribution, but can also be true for an asymmetric distribution where one tail is long and thin, and the other is short but fat. Introduction Consider the two distributions in the figure just below. Within each graph, the values on the right side of the distribution taper differently from the values on the left side. These tapering sides are called ''tai ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Midsummary
In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set: :M=\frac. The mid-range is closely related to the range, a measure of statistical dispersion defined as the difference between maximum and minimum values. The two measures are complementary in sense that if one knows the mid-range and the range, one can find the sample maximum and minimum values. The mid-range is rarely used in practical statistical analysis, as it lacks efficiency as an estimator for most distributions of interest, because it ignores all intermediate points, and lacks robustness, as outliers change it significantly. Indeed, for many distributions it is one of the least efficient and least robust statistics. However, it finds some use in special cases: it is the maximally efficient estimator for the center of a uniform distribution, trimmed mid-ranges address robustness, and as an L-esti ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Interdecile Range
In statistics, the interdecile range is the difference between the first and the ninth deciles (10% and 90%). The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile range, and can be computed from the (non-parametric) seven-number summary. Despite its simplicity, the interdecile range of a sample drawn from a normal distribution can be divided by 2.56 to give a reasonably efficient estimator of the standard deviation of a normal distribution. This is derived from the fact that the lower (respectively upper) decile of a normal distribution with arbitrary variance is equal to the mean minus (respectively, plus) 1.28 times the standard deviation. A more efficient estimator is given by instead taking the 7% trimmed range (the difference between the 7th and 93rd percentiles) and dividing by 3 (corresponding to 86% of the data falling within ±1.5 standard deviations of the mean in a normal distribution ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Sample Maximum
In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistics such as the five-number summary and Bowley's seven-figure summary and the associated box plot. The minimum and the maximum value are the first and last order statistics (often denoted ''X''(1) and ''X''(''n'') respectively, for a sample size of ''n''). If the sample has outliers, they necessarily include the sample maximum or sample minimum, or both, depending on whether they are extremely high or low. However, the sample maximum and minimum need not be outliers, if they are not unusually far from other observations. Robustness The sample maximum and minimum are the ''least'' robust statistics: they are maximally sensitive to outliers. This can either be an advantage or a drawback: if extreme values are real (not measurement errors), ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Decile
In descriptive statistics, a decile is any of the nine values that divide the sorted data into ten equal parts, so that each part represents 1/10 of the sample or population. A decile is one possible form of a quantile; others include the quartile and percentile.. A decile rank arranges the data in order from lowest to highest and is done on a scale of one to ten where each successive number corresponds to an increase of 10 percentage points. Special Usage: The decile mean A moderately robust measure of central tendency - known as the decile mean - can be computed by making use of a sample's deciles D_ to D_ (D_ = 10th percentile, D_ = 20th percentile and so on). It is calculated as follows: : DM = \frac Apart from serving as an alternative for the mean and the truncated mean, it also forms the basis for robust measures of skewness and kurtosis, and even a normality test. See also * Summary statistics * Socio-economic decile In the New Zealand education system, decile is a ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Summary Statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in * a measure of location, or central tendency, such as the arithmetic mean * a measure of statistical dispersion like the standard mean absolute deviation * a measure of the shape of the distribution like skewness or kurtosis * if more than one variable is measured, a measure of statistical dependence such as a correlation coefficient A common collection of order statistics used as summary statistics are the five-number summary, sometimes extended to a seven-number summary, and the associated box plot. Entries in an analysis of variance table can also be regarded as summary statistics. Examples Location Common measures of location, or central tendency, are the arithmetic mean, median, mode, and interquartile mean. Spread Common measures of ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Non-parametric Statistics
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are violated. Definitions The term "nonparametric statistics" has been imprecisely defined in the following two ways, among others: Applications and purpose Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of levels o ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |