Overmatching
   HOME

TheInfoList



OR:

Matching is a statistical technique which is used to evaluate the effect of a treatment by comparing the treated and the non-treated units in an
observational study In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample (statistics), sample to a statistical population, population where the dependent and independent variables, independ ...
or
quasi-experiment A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design ...
(i.e. when the treatment is not randomly assigned). The goal of matching is to reduce bias for the estimated treatment effect in an observational-data study, by finding, for every treated unit, one (or more) non-treated unit(s) with similar observable characteristics against which the covariates are balanced out. By matching treated units to similar non-treated units, matching enables a comparison of outcomes among treated and non-treated units to estimate the effect of the treatment reducing bias due to confounding.
Propensity score matching In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predic ...
, an early matching technique, was developed as part of the
Rubin causal model The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was ...
, but has been shown to increase model dependence, bias, inefficiency, and power and is no longer recommended compared to other matching methods. Matching has been promoted by
Donald Rubin Donald is a masculine given name derived from the Gaelic name ''Dòmhnall''.. This comes from the Proto-Celtic *''Dumno-ualos'' ("world-ruler" or "world-wielder"). The final -''d'' in ''Donald'' is partly derived from a misinterpretation of the ...
. It was prominently criticized in
economics Economics () is the social science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services. Economics focuses on the behaviour and intera ...
by LaLonde (1986), who compared estimates of treatment effects from an
experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
to comparable estimates produced with matching methods and showed that matching methods are biased. Dehejia and Wahba (1999) reevaluated LaLonde's critique and showed that matching is a good solution. Similar critiques have been raised in
political science Political science is the scientific study of politics. It is a social science dealing with systems of governance and power, and the analysis of political activities, political thought, political behavior, and associated constitutions and la ...
and
sociology Sociology is a social science that focuses on society, human social behavior, patterns of Interpersonal ties, social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of Empirical ...
journals.


Analysis

When the outcome of interest is binary, the most general tool for the analysis of matched data is
conditional logistic regression Conditional logistic regression is an extension of logistic regression that allows one to take into account stratification and matching. Its main field of application is observational studies and in particular epidemiology. It was devised in 1978 ...
as it handles strata of arbitrary size and continuous or binary treatments (predictors) and can control for covariates. In particular cases, simpler tests like
paired difference test In statistics, a paired difference test is a type of location test that is used when comparing two sets of measurements to assess whether their population means differ. A paired difference test uses additional information about the sample that i ...
,
McNemar test In statistics, McNemar's test is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal fre ...
and Cochran-Mantel-Haenszel test are available. When the outcome of interest is continuous, estimation of the
average treatment effect The average treatment effect (ATE) is a measure used to compare treatments (or interventions) in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units ...
is performed. Matching can also be used to "pre-process" a sample before analysis via another technique, such as
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
.


Overmatching

''Overmatching'' is matching for an apparent mediator that actually is a result of the exposure. If the mediator itself is stratified, an obscured relation of the exposure to the disease would highly be likely to be induced. Overmatching thus causes statistical bias. For example, matching the control group by gestation length and/or the number of
multiple birth A multiple birth is the culmination of one multiple pregnancy, wherein the mother gives birth to two or more babies. A term most applicable to vertebrate species, multiple births occur in most kinds of mammals, with varying frequencies. Such bir ...
s when estimating
perinatal mortality Perinatal mortality (PNM) refers to the death of a fetus or neonate and is the basis to calculate the perinatal mortality rate. Variations in the precise definition of the perinatal mortality exist, specifically concerning the issue of inclusion o ...
and birthweight after
in vitro fertilization In vitro fertilisation (IVF) is a process of fertilisation where an egg is combined with sperm in vitro ("in glass"). The process involves monitoring and stimulating an individual's ovulatory process, removing an ovum or ova (egg or eggs) ...
(IVF) is overmatching, since IVF itself increases the risk of premature birth and multiple birth. It may be regarded as a
sampling bias In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample of a population (or non-human fa ...
in decreasing the
external validity External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stim ...
of a study, because the controls become more similar to the cases in regard to exposure than the general population.


See also

*
Propensity score matching In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predic ...


References


Further reading

* {{Authority control Bias Design of experiments Medical statistics Observational study Sampling techniques