Controlling For A Variable
   HOME

TheInfoList



OR:

In
causal model In the philosophy of science, a causal model (or structural causal model) is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent va ...
s, controlling for a variable means binning data according to measured values of the variable. This is typically done so that the variable can no longer act as a
confounder In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
in, for example, an
observational study In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample (statistics), sample to a statistical population, population where the dependent and independent variables, independ ...
or
experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
. When estimating the effect of explanatory variables on an outcome by regression, controlled-for variables are included as inputs in order to separate their effects from the explanatory variables. A limitation of controlling for variables is that a causal model is needed to identify important confounders (''backdoor criterion'' is used for the identification). Without having one, a possible confounder might remain unnoticed. Another associated problem is that if a variable which is not a real confounder is controlled for, it may in fact make other variables (possibly not taken into account) become confounders while they weren't confounders before. In other cases, controlling for a non-confounding variable may cause underestimation of the true causal effect of the explanatory variables on an outcome (e.g. when controlling for a
mediator Mediator may refer to: *A person who engages in mediation *Business mediator, a mediator in business * Vanishing mediator, a philosophical concept * Mediator variable, in statistics Chemistry and biology *Mediator (coactivator), a multiprotein ...
or its descendant). ''
Counterfactual Counterfactual conditionals (also ''subjunctive'' or ''X-marked'') are conditional sentences which discuss what would have been true under different circumstances, e.g. "If Peter believed in ghosts, he would be afraid to be here." Counterfactual ...
reasoning'' mitigates the influence of confounders without this drawback''.''


Experiments

Experiments attempt to assess the effect of manipulating one or more
independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
on one or more
dependent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
. To ensure the measured effect is not influenced by external factors, other variables must be held constant. The variables made to remain constant during an experiment are referred to as
control variable A control variable (or scientific constant) in scientific experimentation is an experimental element which is constant (controlled) and unchanged throughout the course of the investigation. Control variables could strongly influence experimental ...
s. For example, if an outdoor experiment were to be conducted to compare how different wing designs of a
paper airplane A paper plane (also known as a paper airplane in American English or paper aeroplane in British English) is a toy aircraft, usually a glider made out of single folded sheet of paper or paperboard. A simple nose-heavy paper plane, thrown like ...
(the independent variable) affect how far it can fly (the dependent variable), one would want to ensure that the experiment is conducted at times when the weather is the same, because one would not want weather to affect the experiment. In this case, the control variables may be wind speed, direction and precipitation. If the experiment were conducted when it was sunny with no wind, but the weather changed, one would want to postpone the completion of the experiment until the control variables (the wind and precipitation level) were the same as when the experiment began. In
controlled experiments A scientific control is an experiment or observation designed to minimize the effects of variables other than the independent variable (i.e. confounding variables). This increases the reliability of the results, often through a comparison be ...
of medical treatment options on humans, researchers randomly assign individuals to a
treatment group In the design of experiments, hypotheses are applied to experimental units in a treatment group. In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...
or
control group In the design of experiments, hypotheses are applied to experimental units in a treatment group. In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...
. This is done to reduce the
confounding In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
effect of irrelevant variables that are not being studied, such as the
placebo effect A placebo ( ) is a substance or treatment which is designed to have no therapeutic value. Common placebos include inert tablets (like sugar pills), inert injections (like Saline (medicine), saline), sham surgery, and other procedures. In general ...
.


Observational studies

In an
observational study In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample (statistics), sample to a statistical population, population where the dependent and independent variables, independ ...
, researchers have no control over the values of the independent variables, such as who receives the treatment. Instead, they must control for variables using
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
. Observational studies are used when controlled experiments may be unethical or impractical. For instance, if a researcher wished to study the effect of unemployment ( the independent variable) on health ( the dependent variable), it would be considered unethical by
institutional review board An institutional review board (IRB), also known as an independent ethics committee (IEC), ethical review board (ERB), or research ethics board (REB), is a committee that applies research ethics by reviewing the methods proposed for research to ens ...
s to randomly assign some participants to have jobs and some not to. Instead, the researcher will have to create a
sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...
which includes some employed people and some unemployed people. However, there could be factors that affect both whether someone is employed and how healthy he or she is. Part of any observed association between the independent variable (employment status) and the dependent variable (health) could be due to these outside,
spurious Spurious may refer to: * Spurious relationship in statistics * Spurious emission or spurious tone in radio engineering * Spurious key in cryptography * Spurious interrupt in computing * Spurious wakeup in computing * ''Spurious'', a 2011 novel ...
factors rather than indicating a true link between them. This can be problematic even in a true random sample. By controlling for the extraneous variables, the researcher can come closer to understanding the true effect of the independent variable on the dependent variable. In this context the extraneous variables can be controlled for by using
multiple regression In statistical modeling, regression analysis is a set of statistical processes for Estimation theory, estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning ...
. The regression uses as independent variables not only the one or ones whose effects on the dependent variable are being studied, but also any potential confounding variables, thus avoiding
omitted variable bias In statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to those that were included. More specifically, OV ...
. "Confounding variables" in this context means other factors that not only influence the ''dependent variable'' (the outcome) but also influence the main ''independent'' variable.


OLS Regressions and control variables

The simplest examples of control variables in regression analysis comes from
Ordinary Least Squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
(OLS) estimators. The OLS framework assumes the following: * Linear relationship - OLS statistical models are linear. Hence the relationship between explanatory variables and the mean of Y must be linear. * Homoscedasticity - This requires
homogeneity Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
of variances, that is equal or similar variances across these data. * Independence/No
Autocorrelation Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
- Error terms from one (or more) observation can not be influenced by error terms of other observations. * Normality of Errors - The errors are jointly normal and uncorrelated, this implies that (\epsilon_i)_ i.e. that the error terms are an independently and identically distributed set (iid). This implies that the unobservables between different groups or observations are independent. * No multicollinearity - Independent variables must not be highly correlated with each other. For regressions using matrix notation, the matrix must be full rank i.e. X^X is invertible. Accordingly, a control variable can be interpreted as an linear explanatory variable that affects the mean value of Y (Assumption 1), but which does not present the primary variable of investigation, and which also satisfies the other assumptions above.


Example

Consider a study about whether getting older affects someone's
life satisfaction Life satisfaction is a measure of a person's well-being, assessed in terms of mood, relationship satisfaction, achieved goals, self-concepts, and self-perceived ability to cope with life. Life satisfaction involves a favorable attitude towards on ...
. (Some researchers perceive a "u-shape": life satisfaction appears to decline first and then rise after middle age.) To identify the control variables needed here, one could ask what other variables determine not only someone's life satisfaction but also their age. Many other variables determine life satisfaction. But ''no other variable'' determines how old someone is (as long as they remain alive). (All people keep getting older, at the same rate, no matter what their other characteristics.) So, no control variables are needed here. To determine the needed control variables, it can be useful to construct a
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ve ...
.


See also

*
Scientific control A scientific control is an experiment or observation designed to minimize the effects of variables other than the independent variable (i.e. confounding variables). This increases the reliability of the results, often through a comparison betwe ...
*
Mixed model A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. ...
*
Age adjustment In epidemiology and demography, age adjustment, also called age standardization, is a technique used to allow statistical populations to be compared when the age profiles of the populations are quite different. Example For example, in 2004/5, two ...


References


Further reading

*{{cite book , last1= Freedman , first1=David , last2= Pisani , first2= Robert , last3=Purves , first3=Roger , date=2007 , title=Statistics , url=https://books.google.com/books?id=mviJQgAACAAJ , publisher=W. W. Norton & Company , isbn=978-0393929720 Observational study Design of experiments