science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...

, randomized experiments are the

experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs whe ...

s that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in

experimental design The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...

and in

survey sampling In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term " survey" may refer to many different types or techniques of observation. In survey sampling it most oft ...

Overview

In the statistical theory of

design of experiments The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...

, randomization involves randomly allocating the experimental units across the treatment groups. For example, if an experiment compares a new drug against a standard drug, then the patients should be allocated to either the new drug or to the standard drug control using randomization. Randomized experimentation is ''not'' haphazard. Randomization reduces

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...

by equalising other factors that have not been explicitly accounted for in the experimental design (according to the

law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...

). Randomization also produces ignorable designs, which are valuable in

model A model is an informative representation of an object, person, or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin , . Models can be divided in ...

-based

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...

, especially Bayesian or

likelihood A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...

-based. In the design of experiments, the simplest design for comparing treatments is the "completely randomized design". Some "restriction on randomization" can occur with blocking and experiments that have hard-to-change factors; additional restrictions on randomization can occur when a full randomization is infeasible or when it is desirable to reduce the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

of estimators of selected effects. Randomization of treatment in

clinical trials Clinical trials are prospective biomedical or behavioral research studies on human subject research, human participants designed to answer specific questions about biomedical or behavioral interventions, including new treatments (such as novel v ...

pose ethical problems. In some cases, randomization reduces the therapeutic options for both physician and patient, and so randomization requires clinical equipoise regarding the treatments.

Online randomized controlled experiments

Web sites can run randomized controlled experiments to create a feedback loop. Key differences between offline experimentation and online experiments include: * Logging: user interactions can be logged reliably. * Number of users: large sites, such as Amazon, Bing/Microsoft, and Google run experiments, each with over a million users. * Number of concurrent experiments: large sites run tens of overlapping, or concurrent, experiments. * Robots, whether web crawlers from valid sources or malicious internet bots. * Ability to ramp-up experiments from low percentages to higher percentages. * Speed / performance has significant impact on key metrics. * Ability to use the pre-experiment period as an A/A test to reduce variance.

History

A controlled experiment appears to have been suggested in the Old Testament's

Book of Daniel The Book of Daniel is a 2nd-century BC biblical apocalypse with a 6th-century BC setting. It is ostensibly a narrative detailing the experiences and Prophecy, prophetic visions of Daniel, a Jewish Babylonian captivity, exile in Babylon ...

. King Nebuchadnezzar proposed that some Israelites eat "a daily amount of food and wine from the king's table." Daniel preferred a

vegetarian Vegetarianism is the practice of abstaining from the Eating, consumption of meat (red meat, poultry, seafood, insects as food, insects, and the flesh of any other animal). It may also include abstaining from eating all by-products of animal slau ...

diet, but the official was concerned that the king would "see you looking worse than the other young men your age? The king would then have my head because of you." Daniel then proposed the following controlled experiment: "Test your servants for ten days. Give us nothing but vegetables to eat and water to drink. Then compare our appearance with that of the young men who eat the royal food, and treat your servants in accordance with what you see". (Daniel 1, 12– 13). Randomized experiments were institutionalized in psychology and education in the late eighteen-hundreds, following the invention of randomized experiments by C. S. Peirce. Outside of psychology and education, randomized experiments were popularized by R.A. Fisher in his book ''

Statistical Methods for Research Workers ''Statistical Methods for Research Workers'' is a classic book on statistics, written by the statistician R. A. Fisher. It is considered by some to be one of the 20th century's most influential books on statistical methods, together with his '' T ...

'', which also introduced additional principles of experimental design.

Statistical interpretation

The Rubin Causal Model provides a common way to describe a randomized experiment. While the Rubin Causal Model provides a framework for defining the causal parameters (i.e., the effects of a randomized treatment on an outcome), the analysis of experiments can take a number of forms. The model assumes that there are two potential outcomes for each unit in the study: the outcome if the unit receives the treatment and the outcome if the unit does not receive the treatment. The difference between these two potential outcomes is known as the treatment effect, which is the causal effect of the treatment on the outcome. Most commonly, randomized experiments are analyzed using ANOVA,

student's t-test Student's ''t''-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's ''t''- ...

, regression analysis, or a similar

statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...

. The model also accounts for potential confounding factors, which are factors that could affect both the treatment and the outcome. By controlling for these confounding factors, the model helps to ensure that any observed treatment effect is truly causal and not simply the result of other factors that are correlated with both the treatment and the outcome. The Rubin Causal Model is a useful a framework for understanding how to estimate the causal effect of the treatment, even when there are confounding variables that may affect the outcome. This model specifies that the causal effect of the treatment is the difference in the outcomes that would have been observed for each individual if they had received the treatment and if they had not received the treatment. In practice, it is not possible to observe both potential outcomes for the same individual, so statistical methods are used to estimate the causal effect using data from the experiment.

Empirical evidence that randomization makes a difference

Empirically differences between randomized and non-randomized studies, and between adequately and inadequately randomized trials have been difficult to detect.

Directed acyclic graph (DAG) explanation of randomization

Randomization is the cornerstone of many scientific claims. To randomize, means that we can eliminate the confounding factors. Say we study the effect of A on B. Yet, there are many unobservables U that potentially affect B and confound our estimate of the finding. To explain these kinds of issues, statisticians or econometricians nowadays use

directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...

References

* * * * * {{Statistics, collection, state=collapsed Design of experiments Experiments