In the statistical analysis of

observational data In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concern ...

, propensity score matching (PSM) is a statistical matching technique that attempts to

estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

the effect of a treatment, policy, or other intervention by accounting for the

covariate Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...

s that predict receiving the treatment. PSM attempts to reduce the

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...

due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among

units Unit may refer to: Arts and entertainment * UNIT, a fictional military organization in the science fiction television series ''Doctor Who'' * Unit of action, a discrete piece of action (or beat) in a theatrical presentation Music * Unit (album), ...

that received the treatment versus those that did not. Paul R. Rosenbaum and

Donald Rubin Donald is a masculine given name derived from the Gaelic name ''Dòmhnall''.. This comes from the Proto-Celtic *''Dumno-ualos'' ("world-ruler" or "world-wielder"). The final -''d'' in ''Donald'' is partly derived from a misinterpretation of the ...

introduced the technique in 1983. The possibility of bias arises because a difference in the treatment outcome (such as the

average treatment effect The average treatment effect (ATE) is a measure used to compare treatments (or interventions) in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units ...

) between treated and untreated groups may be caused by a factor that predicts treatment rather than the treatment itself. In

randomized experiment In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey samp ...

s, the randomization enables unbiased estimation of treatment effects; for each covariate, randomization implies that treatment-groups will be balanced on average, by the law of large numbers. Unfortunately, for observational studies, the assignment of treatments to research subjects is typically not random. Matching attempts to reduce the treatment assignment bias, and mimic randomization, by creating a sample of units that received the treatment that is comparable on all observed covariates to a sample of units that did not receive the treatment. The "propensity" describes how likely a unit is to have been treated, given its covariate values. The stronger the confounding of treatment and covariates, and hence the stronger the bias in the analysis of the naive treatment effect, the better the covariates predict whether a unit is treated or not. By having units with similar propensity scores in both treatment and control, such confounding is reduced. For example, one may be interested to know the consequences of smoking. An observational study is required since it is unethical to randomly assign people to the treatment 'smoking.' The treatment effect estimated by simply comparing those who smoked to those who did not smoke would be biased by any factors that predict smoking (e.g.: gender and age). PSM attempts to control for these biases by making the groups receiving treatment and not-treatment comparable with respect to the control variables.

Overview

PSM is for cases of causal inference and simple selection bias in

non-experimental In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical con ...

settings in which: (i) few units in the non-treatment comparison group are comparable to the treatment units; and (ii) selecting a subset of comparison units similar to the treatment unit is difficult because units must be compared across a high-dimensional set of pretreatment characteristics. In normal matching, single characteristics that distinguish treatment and control groups are matched in an attempt to make the groups more alike. But if the two groups do not have substantial overlap, then substantial

error An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...

may be introduced. For example, if only the worst cases from the untreated "comparison" group are compared to only the best cases from the

treatment group In the design of experiments, hypotheses are applied to experimental units in a treatment group. In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...

, the result may be regression toward the mean, which may make the comparison group look better or worse than reality. PSM employs a predicted probability of group membership—e.g., treatment versus control group—based on observed predictors, usually obtained from

logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression a ...

to create a counterfactual group. Propensity scores may be used for matching or as

s, alone or with other matching variables or covariates.

General procedure

1. Estimate propensity scores, e.g. with

: *Dependent variable: ''Z'' = 1, if unit participated (i.e. is member of the treatment group); ''Z'' = 0, if unit did not participate (i.e. is member of the control group). *Choose appropriate confounders (variables hypothesized to be associated with both treatment and outcome) *Obtain an

estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

for the propensity score: predicted probability ''p'' or log 'p''/(1 − ''p'') 2. Match each participant to one or more nonparticipants on propensity score, using one of these methods: * Nearest neighbor matching *Optimal full matching: match each participants to unique non-participant(s) so as to minimize the total distance in propensity scores between participants and their matched non-participants. This method can be combined with other matching techniques. *Caliper matching: comparison units within a certain width of the propensity score of the treated units get matched, where the width is generally a fraction of the standard deviation of the propensity score * Mahalanobis metric matching in conjunction with PSM * Stratification matching *Difference-in-differences matching (kernel and local linear weights) *Exact matching 3. Check that covariates are balanced across treatment and comparison groups within strata of the propensity score. * Use standardized differences or graphs to examine distributions * If covariates are not balanced, return to steps 1 or 2 and modify the procedure 4. Estimate effects based on new sample *Typically: a weighted mean of within-match average differences in outcomes between participants and non-participants. *Use analyses appropriate for non-independent matched samples if more than one nonparticipant is matched to each participant

Formal definitions

Basic settings

The basic case is of two treatments (numbered 1 and 0), with ''N''

independent and identically distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usu ...

subjects. Each subject ''i'' would respond to the treatment with

r_

and to the control with

r_

. The quantity to be estimated is the

E_1 E_0 /math>. The variable Z_i indicates if subject ''i'' got treatment (Z_i = 1) or control (Z_i = 0). Let X_i be a vector of observed pretreatment measurements (or covariates) for the ''i''th subject. The observations of X_i are made prior to treatment assignment, but the features in X_i may not include all (or any) of the ones used to decide on the treatment assignment. The numbering of the units (i.e.: ''i'' = 1, ..., ''N'') are assumed to not contain any information beyond what is contained in X_i . The following sections will omit the ''i'' index while still discussing the stochastic behavior of some subject.

Strongly ignorable treatment assignment

Let some subject have a vector of covariates ''X'' (i.e.: conditionally unconfounded), and some potential outcomes ''r''₀ and ''r''₁ under control and treatment, respectively. Treatment assignment is said to be strongly ignorable if the potential outcomes are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...

of treatment (''Z'') conditional on background variables ''X''. This can be written compactly as :

r_0, r_1 \perp Z \mid X

where

\perp

denotes statistical independence.

Balancing score

A balancing score ''b''(''X'') is a function of the observed covariates ''X'' such that the

conditional distribution In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value; in some cases the ...

of ''X'' given ''b''(''X'') is the same for treated (''Z'' = 1) and control (''Z'' = 0) units: :

Z \perp X \mid b(X).

The most trivial function is

b(X) = X

Propensity score

A propensity score is the

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

of a unit (e.g., person, classroom, school) being assigned to a particular treatment given a set of observed covariates. Propensity scores are used to reduce selection bias by equating groups based on these covariates. Suppose that we have a binary treatment

indicator Indicator may refer to: Biology * Environmental indicator of environmental health (pressures, conditions and responses) * Ecological indicator of ecosystem health (ecological processes) * Health indicator, which is used to describe the health ...

''Z'', a response variable ''r'', and background observed covariates ''X''. The propensity score is defined as the

conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...

of treatment given background variables: :

e(x) \ \stackrel\  \Pr(Z=1 \mid X=x).

In the context of

causal inference Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference ana ...

and

survey methodology Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey da ...

, propensity scores are estimated (via methods such as

random forests Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of t ...

, or others), using some set of covariates. These propensity scores are then used as estimators for weights to be used with

Inverse probability weighting Inverse probability weighting is a statistical technique for calculating statistics standardized to a pseudo-population different from that in which the data was collected. Study designs with a disparate sampling population and population of target ...

methods.

Main theorems

The following were first presented, and proven, by Rosenbaum and Rubin in 1983: * The propensity score

e(x)

is a balancing score. * Any score that is 'finer' than the propensity score is a balancing score (i.e.:

e(X)=f(b(X))

for some function ''f''). The propensity score is the coarsest balancing score function, as it takes a (possibly) multidimensional object (''X''_''i'') and transforms it into one dimension (although others, obviously, also exist), while

b(X)=X

is the finest one. * If treatment assignment is strongly ignorable given ''X'' then: :* It is also strongly ignorable given any balancing function. Specifically, given the propensity score: :::

(r_0, r_1) \perp Z \mid e(X).

:* For any value of a balancing score, the difference between the treatment and control means of the samples at hand (i.e.:

\bar_1-\bar_0

), based on subjects that have the same value of the balancing score, can serve as an

unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...

of the

E_1 E_0 /math>. 
* Using sample estimates of balancing scores can produce sample balance on ''X''

Relationship to sufficiency

If we think of the value of ''Z'' as a

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

of the population that impacts the distribution of ''X'' then the balancing score serves as a

sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...

for ''Z''. Furthermore, the above theorems indicate that the propensity score is a

minimal sufficient statistic Minimal may refer to: * Minimal (music genre), art music that employs limited or minimal musical materials * "Minimal" (song), 2006 song by Pet Shop Boys * Minimal (supermarket) or miniMAL, a former supermarket chain in Germany and Poland * Minim ...

if thinking of ''Z'' as a parameter of ''X''. Lastly, if treatment assignment ''Z'' is strongly ignorable given ''X'' then the propensity score is a

for the joint distribution of

(r_0, r_1)

Graphical test for detecting the presence of confounding variables

Judea Pearl Judea Pearl (born September 4, 1936) is an Israeli-American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belief ...

has shown that there exists a simple graphical test, called the back-door criterion, which detects the presence of confounding variables. To estimate the effect of treatment, the background variables X must block all back-door paths in the graph. This blocking can be done either by adding the confounding variable as a control in regression, or by matching on the confounding variable.

Disadvantages

PSM has been shown to increase model "imbalance, inefficiency, model dependence, and bias," which is not the case with most other matching methods. The insights behind the use of matching still hold but should be applied with other matching methods; propensity scores also have other productive uses in weighting and doubly robust estimation. Like other matching procedures, PSM estimates an

from observational data. The key advantages of PSM were, at the time of its introduction, that by using a linear combination of covariates for a single score, it balances treatment and control groups on a large number of covariates without losing a large number of observations. If units in the treatment and control were balanced on a large number of covariates one at a time, large numbers of observations would be needed to overcome the " dimensionality problem" whereby the introduction of a new balancing covariate increases the minimum necessary number of observations in the sample geometrically. One disadvantage of PSM is that it only accounts for observed (and observable) covariates and not latent characteristics. Factors that affect assignment to treatment and outcome but that cannot be observed cannot be accounted for in the matching procedure. As the procedure only controls for observed variables, any hidden bias due to latent variables may remain after matching. Another issue is that PSM requires large samples, with substantial overlap between treatment and control groups. General concerns with matching have also been raised by

, who has argued that hidden bias may actually increase because matching on observed variables may unleash bias due to dormant unobserved confounders. Similarly, Pearl has argued that bias reduction can only be assured (asymptotically) by modelling the qualitative causal relationships between treatment, outcome, observed and unobserved covariates. Confounding occurs when the experimenter is unable to control for alternative, non-causal explanations for an observed relationship between independent and dependent variables. Such control should satisfy the " backdoor criterion" of Pearl.

Implementations in statistics packages

* R: propensity score matching is available as part of the MatchIt, optmatch, or other packages. * SAS: The PSMatch procedure, and macro OneToManyMTCH match observations based on a propensity score. * Stata: several commands implement propensity score matching, including the user-written psmatch2. Stata version 13 and later also offers the built-in command teffects psmatch. *

SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...

: A dialog box for Propensity Score Matching is available from the IBM SPSS Statistics menu (Data/Propensity Score Matching), and allows the user to set the match tolerance, randomize case order when drawing samples, prioritize exact matches, sample with or without replacement, set a random seed, and maximize performance by increasing processing speed and minimizing memory usage. *

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...

: PsmPy, a library for propensity score matching in python

References

Bibliography

* * * {{least squares and regression analysis Regression analysis Epidemiology Observational study Causal inference