Contents 1 Scope 1.1 Mathematical statistics 2 Overview
3
3.1 Sampling 3.2 Experimental and observational studies 3.2.1 Experiments 3.2.2 Observational study 4 Types of data 5 Terminology and theory of inferential statistics 5.1 Statistics, estimators and pivotal quantities
5.2
6 Misuse 6.1 Misinterpretation: correlation 7 History of statistical science 8 Applications 8.1 Applied statistics, theoretical statistics and mathematical
statistics
8.2
9 Specialized disciplines 10 See also 11 References 12 Further reading 13 External links Scope[edit] Some definitions are: Merriam-Webster dictionary defines statistics as "a branch of
mathematics dealing with the collection, analysis, interpretation, and
presentation of masses of numerical data."[6]
Planning the research, including finding the number of replicates of
the study, using the following information: preliminary estimates
regarding the size of treatment effects, alternative hypotheses, and
the estimated experimental variability. Consideration of the selection
of experimental subjects and the ethics of research is necessary.
Experiments on human behavior have special concerns. The famous
Type I errors where the null hypothesis is falsely rejected giving a "false positive". Type II errors where the null hypothesis fails to be rejected and an actual difference between populations is missed giving a "false negative".
A least squares fit: in red the points to be fitted, in blue the fitted line. Many statistical methods seek to minimize the residual sum of squares,
and these are called "methods of least squares" in contrast to Least
absolute deviations. The latter gives equal weight to small and big
errors, while the former gives more weight to large errors. Residual
sum of squares is also differentiable, which provides a handy property
for doing regression.
Confidence intervals: the red line is true value for the mean in this example, the blue lines are random confidence intervals for 100 realizations. Most studies only sample part of a population, so results don't fully
represent the whole population. Any estimates obtained from the sample
only approximate the population value.
In this graph the black line is probability distribution for the test statistic, the critical region is the set of values to the right of the observed data point (observed value of the test statistic) and the p-value is represented by the green area. The standard approach[23] is to test a null hypothesis against an alternative hypothesis. A critical region is the set of values of the estimator that leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true (statistical significance) and the probability of type II error is the probability that the estimator doesn't belong to the critical region given that the alternative hypothesis is true. The statistical power of a test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false. Referring to statistical significance does not necessarily mean that the overall result is significant in real world terms. For example, in a large study of a drug it may be shown that the drug has a statistically significant but very small beneficial effect, such that the drug is unlikely to help the patient noticeably. While in principle the acceptable level of statistical significance may be subject to debate, the p-value is the smallest significance level that allows the test to reject the null hypothesis. This is logically equivalent to saying that the p-value is the probability, assuming the null hypothesis is true, of observing a result at least as extreme as the test statistic. Therefore, the smaller the p-value, the lower the probability of committing type I error. Some problems are usually associated with this framework (See criticism of hypothesis testing): A difference that is highly statistically significant can still be of no practical significance, but it is possible to properly formulate tests to account for this. One response involves going beyond reporting only the significance level to include the p-value when reporting whether a hypothesis is rejected or accepted. The p-value, however, does not indicate the size or importance of the observed effect and can also seem to exaggerate the importance of minor differences in large studies. A better and increasingly common approach is to report confidence intervals. Although these are produced from the same calculations as those of hypothesis tests or p-values, they describe both the size of the effect and the uncertainty surrounding it. Fallacy of the transposed conditional, aka prosecutor's fallacy: criticisms arise because the hypothesis testing approach forces one hypothesis (the null hypothesis) to be favored, since what is being evaluated is probability of the observed result given the null hypothesis and not probability of the null hypothesis given the observed result. An alternative to this approach is offered by Bayesian inference, although it requires establishing a prior probability.[27] Rejecting the null hypothesis does not automatically prove the alternative hypothesis. As everything in inferential statistics it relies on sample size, and therefore under fat tails p-values may be seriously mis-computed.[clarification needed] Examples[edit] Some well-known statistical tests and procedures are:
Misuse[edit]
Main article: Misuse of statistics
Who says so? (Does he/she have an axe to grind?) How does he/she know? (Does he/she have the resources to know the facts?) What’s missing? (Does he/she give us a complete picture?) Did someone change the subject? (Does he/she offer us the right answer to the wrong problem?) Does it make sense? (Is his/her conclusion logical and consistent with what we already know?) The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third variable Z. Z is called a confounding factor. Misinterpretation: correlation[edit]
The concept of correlation is particularly noteworthy for the
potential confusion it can cause. Statistical analysis of a data set
often reveals that two variables (properties) of the population under
consideration tend to vary together, as if they were connected. For
example, a study of annual income that also looks at age of death
might find that poor people tend to have shorter lives than affluent
people. The two variables are said to be correlated; however, they may
or may not be the cause of one another. The correlation phenomena
could be caused by a third, previously unconsidered phenomenon, called
a lurking variable or confounding variable. For this reason, there is
no way to immediately infer the existence of a causal relationship
between the two variables. (See
Gerolamo Cardano, the earliest pioneer on the mathematics of probability. Main articles:
Karl Pearson, a founder of mathematical statistics. The modern field of statistics emerged in the late 19th and early 20th
century in three stages.[37] The first wave, at the turn of the
century, was led by the work of
gretl, an example of an open source statistical package Main article: Computational statistics
The rapid and sustained increases in computing power starting from the
second half of the 20th century have had a substantial impact on the
practice of statistical science. Early statistical models were almost
always from the class of linear models, but powerful computers,
coupled with suitable numerical algorithms, caused an increased
interest in nonlinear models (such as neural networks) as well as the
creation of new types, such as generalized linear models and
multilevel models.
Increased computing power has also led to the growing popularity of
computationally intensive methods based on resampling, such as
permutation tests and the bootstrap, while techniques such as Gibbs
sampling have made use of Bayesian models more feasible. The computer
revolution has implications for the future of statistics with new
emphasis on "experimental" and "empirical" statistics. A large number
of both general and special purpose statistical software are now
available. Examples of available software capable of complex
statistical computation include programs such as Mathematica, SAS,
SPSS, and R.
In number theory, scatter plots of data generated by a distribution
function may be transformed with familiar tools used in statistics to
reveal underlying patterns, which may then lead to hypotheses.
Methods of statistics including predictive methods in forecasting are
combined with chaos theory and fractal geometry to create video works
that are considered to have great beauty.[citation needed]
The process art of
Specialized disciplines[edit] Main article: List of fields of application of statistics Statistical techniques are used in a wide range of types of scientific and social research, including: biostatistics, computational biology, computational sociology, network biology, social science, sociology and social research. Some fields of inquiry use applied statistics so extensively that they have specialized terminology. These disciplines include:
In addition, there are particular types of statistical analysis that have also developed their own specialised terminology and methodology: Bootstrap / Jackknife resampling
Multivariate statistics
Statistical classification
Structured data analysis (statistics)
Structural equation modelling
Survey methodology
Survival analysis
Library resources about Statistics Resources in your library Main article: Outline of statistics Abundance estimation
Foundations and major areas of statistics Foundations of statistics List of statisticians Official statistics Multivariate analysis of variance References[edit] ^ a b Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms,
Oxford University Press. ISBN 0-19-920613-9
^ Romijn, Jan-Willem (2014). "Philosophy of statistics". Stanford
Encyclopedia of Philosophy.
^ Lund Research Ltd. "Descriptive and Inferential Statistics".
statistics.laerd.com. Retrieved 2014-03-23.
^ "What Is the Difference Between Type I and Type II Hypothesis
Testing Errors?". About.com Education. Retrieved 2015-11-27.
^ "How to Calculate Descriptive Statistics". Answers Consulting.
2018-02-03.
^ "Definition of STATISTICS". www.merriam-webster.com. Retrieved
2016-05-28.
^ "Essay on Statistics: Meaning and Definition of Statistics".
Economics Discussion. 2014-12-02. Retrieved 2016-05-28.
^ Moses, Lincoln E. (1986) Think and Explain with Statistics,
Addison-Wesley, ISBN 978-0-201-15619-5. pp. 1–3
^ Hays, William Lee, (1973)
Further reading[edit] Barbara Illowsky; Susan Dean (2014). Introductory Statistics. OpenStax
CNX. ISBN 9781938168208.
David W. Stockburger, Introductory Statistics: Concepts, Models, and
Applications, 3rd Web Ed. Missouri State University.
Stephen Jones, 2010.
External links[edit] Find more aboutStatisticsat's sister projects Definitions from Wiktionary Media from Wikimedia Commons News from Wikinews Quotations from Wikiquote Texts from Wikisource Textbooks from Wikibooks Learning resources from Wikiversity (Electronic Version): StatSoft, Inc. (2013). Electronic Statistics
Textbook. Tulsa, OK: StatSoft.
Online
v t e Statistics Outline Index Descriptive statistics Continuous data Center Mean arithmetic geometric harmonic Median Mode Dispersion Variance Standard deviation Coefficient of variation Percentile Range Interquartile range Shape Central limit theorem Moments Skewness Kurtosis L-moments Count data Index of dispersion Summary tables Grouped data Frequency distribution Contingency table Dependence Pearson product-moment correlation Rank correlation Spearman's rho Kendall's tau Partial correlation Scatter plot Graphics Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Run chart Scatter plot Stem-and-leaf display Radar chart
Study design Population Statistic Effect size Statistical power Sample size determination Missing data Survey methodology Sampling stratified cluster Standard error Opinion poll Questionnaire Controlled experiments Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment Uncontrolled studies Observational study
Statistical inference Statistical theory Population Statistic Probability distribution Sampling distribution Order statistic Empirical distribution Density estimation Statistical model Lp space Parameter location scale shape Parametric family Likelihood (monotone) Location–scale family Exponential family Completeness Sufficiency Statistical functional Bootstrap U V Optimal decision loss function Efficiency Statistical distance divergence Asymptotics Robustness Frequentist inference Point estimation Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem
Plug-in Interval estimation Confidence interval
Pivot
Likelihood interval
Bootstrap Jackknife Testing hypotheses 1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons Parametric tests Likelihood-ratio Wald Score Specific tests
Goodness of fit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC Rank statistics Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Bayesian inference Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator Correlation Regression analysis Correlation Pearson product-moment
Partial correlation
Regression analysis Errors and residuals
Regression model validation
Mixed effects models
Simultaneous equations models
Linear regression Simple linear regression Ordinary least squares General linear model Bayesian regression Non-standard predictors Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity Generalized linear model Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions Partition of variance
Categorical / Multivariate / Time-series / Survival analysis Categorical Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test Multivariate Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal Time-series General Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality Specific tests Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey Time domain
partial (PACF)
Frequency domain Spectral density estimation Fourier analysis Wavelet Whittle likelihood Survival Survival function
Hazard function Nelson–Aalen estimator Test Log-rank test Applications Biostatistics Bioinformatics Clinical trials / studies Epidemiology Medical statistics Engineering statistics Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification Social statistics Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics Spatial statistics Cartography Environmental statistics Geographic information system Geostatistics Kriging Category Portal Commons WikiProject v t e Areas of mathematics outline topic lists Branches Arithmetic History of mathematics
Philosophy of mathematics
Algebra Number theory Elementary Linear Multilinear Abstract Combinatorics Group theory Calculus Analysis Differential equations / Dynamical systems Numerical analysis Optimization Functional analysis Geometry Discrete Algebraic Analytic Differential Finite Topology Trigonometry Applied Probability
Mathematical physics
Mathematical statistics
Statistics
Divisions Pure Applied Discrete Computational Category Portal Commons WikiProject v t e Glossaries of science and engineering Aerospace engineering Archaeology Architecture Artificial intelligence Astronomy Biology Botany Calculus Chemistry Civil engineering Clinical research Ecology Economics Electrical and electronics engineering Engineering Entomology Environmental science Genetics Geography Geology Machine vision Mathematics Mechanical engineering Physics Probability and statistics Robotics Speciation Structural engineering
Authority control GND: 40569 |