Dependent and independent variables are
variables in
mathematical modeling
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, b ...
,
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
ing and
experimental science
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when ...
s. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand that they depend, by some law or rule (e.g., by a
mathematical function
In mathematics, a function from a set to a set assigns to each element of exactly one element of .; the words map, mapping, transformation, correspondence, and operator are often used synonymously. The set is called the domain of the functi ...
), on the values of other variables. Independent variables, in turn, are not seen as depending on any other variable in the scope of the experiment in question. In this sense, some common independent variables are
time
Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to ...
,
space
Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consider ...
,
density
Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematical ...
,
mass
Mass is an intrinsic property of a body. It was traditionally believed to be related to the quantity of matter in a physical body, until the discovery of the atom and particle physics. It was found that different atoms and different elementar ...
,
fluid flow rate, and previous values of some observed value of interest (e.g. human population size) to predict future values (the dependent variable).
Of the two, it is always the dependent variable whose
variation is being studied, by altering inputs, also known as regressors in a
statistical context. In an experiment, any variable that can be attributed a value without attributing a value to any other variable is called an independent variable.
Models
A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure.
Models c ...
and
experiment
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
s test the effects that the independent variables have on the dependent variables. Sometimes, even if their influence is not of direct interest, independent variables may be included for other reasons, such as to account for their potential
confounding
In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
effect.
Mathematics
In mathematics, a
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
is a rule for taking an input (in the simplest case, a number or set of numbers)
[Carlson, Robert. A concrete introduction to real analysis. CRC Press, 2006. p.183] and providing an output (which may also be a number).
[ A symbol that stands for an arbitrary input is called an independent variable, while a symbol that stands for an arbitrary output is called a dependent variable.][Stewart, James. Calculus. Cengage Learning, 2011. Section 1.1] The most common symbol for the input is , and the most common symbol for the output is ; the function itself is commonly written .[
It is possible to have multiple independent variables or multiple dependent variables. For instance, in ]multivariable calculus
Multivariable calculus (also known as multivariate calculus) is the extension of calculus in one variable to calculus with functions of several variables: the differentiation and integration of functions involving several variables, rather th ...
, one often encounters functions of the form , where is a dependent variable and and are independent variables. Functions with multiple outputs are often referred to as vector-valued functions
A vector-valued function, also referred to as a vector function, is a mathematical function of one or more variables whose range is a set of multidimensional vectors or infinite-dimensional vectors. The input of a vector-valued function could ...
.
Modeling
In mathematical modeling
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, b ...
, the dependent variable is studied to see if and how much it varies as the independent variables vary. In the simple stochastic
Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...
linear model
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
the term is the th value of the dependent variable and is the th value of the independent variable. The term is known as the "error" and contains the variability of the dependent variable not explained by the independent variable.
With multiple independent variables, the model is , where is the number of independent variables.
The linear regression model is now discussed. To use linear regression, a scatter plot
A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. ...
of data is generated with as the independent variable and as the dependent variable. This is also called a bivariate dataset, . The simple linear regression model takes the form of , for . In this case, are independent random variables. This occurs when the measurements do not influence each other. Through propagation of independence, the independence of implies independence of , even though each has a different expectation value. Each has an expectation value of 0 and a variance of .
Expectation of Proof:
:
The line of best fit for the bivariate dataset takes the form and is called the regression line. and correspond to the intercept and slope, respectively.
Simulation
In simulation
A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
, the dependent variable is changed in response to changes in the independent variables.
Statistics
In an experiment
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
, the variable manipulated by an experimenter is something that is proven to work, called an independent variable. The dependent variable is the event expected to change when the independent variable is manipulated.['' Random House Webster's Unabridged Dictionary.'' Random House, Inc. 2001. Page 534, 971. .]
In data mining tools (for multivariate statistics
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
Multivariate statistics concerns understanding the different aims and background of each of the dif ...
and machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
), the dependent variable is assigned a ''role'' as (or in some tools as ''label attribute''), while an independent variable may be assigned a role as ''regular variable''. Known values for the target variable are provided for the training data set and test data
Test data is data which has been specifically identified for use in tests, typically of a computer program.
Background
Some data may be used in a confirmatory way, typically to verify that a given set of input to a given function produces some e ...
set, but should be predicted for other data. The target variable is used in supervised learning
Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
algorithms but not in unsupervised learning.
Statistics synonyms
Depending on the context, an independent variable is sometimes called a "predictor variable", "regressor", "covariate", "manipulated variable", "explanatory variable", "exposure variable" (see reliability theory
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specifi ...
), "risk factor
In epidemiology, a risk factor or determinant is a variable associated with an increased risk of disease or infection.
Due to a lack of harmonization across disciplines, determinant, in its more widely accepted scientific meaning, is often use ...
" (see medical statistics
Medical statistics deals with applications of statistics to medicine and the health sciences, including epidemiology, public health, forensic medicine, and clinical research. Medical statistics has been a recognized branch of statistics in the Un ...
), "feature
Feature may refer to:
Computing
* Feature (CAD), could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (software design) is an intentional distinguishing characteristic of a software item ...
" (in machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and pattern recognition
Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphi ...
) or "input variable".[Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. (entry for "independent variable")][Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. (entry for "regression")]
In econometrics
Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
, the term "control variable" is usually used instead of "covariate".
"Explanatory variable" is preferred by some authors over "independent variable" when the quantities treated as independent variables may not be statistically independent or independently manipulable by the researcher.[Everitt, B.S. (2002) Cambridge Dictionary of Statistics, CUP. ][Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. ] If the independent variable is referred to as an "explanatory variable" then the term "response variable" is preferred by some authors for the dependent variable.
From the Economics community, the independent variables are also called exogenous
In a variety of contexts, exogeny or exogeneity () is the fact of an action or object originating externally. It contrasts with endogeneity or endogeny, the fact of being influenced within a system.
Economics
In an economic model, an exogeno ...
.
Depending on the context, a dependent variable is sometimes called a "response variable", "regressand", "criterion", "predicted variable", "measured variable", "explained variable", "experimental variable", "responding variable", "outcome variable", "output variable", "target" or "label". In economics endogenous variables are usually referencing the target.
"Explained variable" is preferred by some authors over "dependent variable" when the quantities treated as "dependent variables" may not be statistically dependent.[Ash Narayan Sah (2009) Data Analysis Using Microsoft Excel, New Delhi. ] If the dependent variable is referred to as an "explained variable" then the term "predictor variable" is preferred by some authors for the independent variable.
Variables may also be referred to by their form: continuous
Continuity or continuous may refer to:
Mathematics
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous ...
or categorical, which in turn may be binary/dichotomous, nominal categorical, and ordinal categorical, among others.
An example is provided by the analysis of trend in sea level by . Here the dependent variable (and variable of most interest) was the annual mean sea level at a given location for which a series of yearly values were available. The primary independent variable was time. Use was made of a covariate consisting of yearly values of annual mean atmospheric pressure at sea level. The results showed that inclusion of the covariate allowed improved estimates of the trend against time to be obtained, compared to analyses which omitted the covariate.
Other variables
A variable may be thought to alter the dependent or independent variables, but may not actually be the focus of the experiment. So that the variable will be kept constant or monitored to try to minimize its effect on the experiment. Such variables may be designated as either a "controlled variable", "control variable
A control variable (or scientific constant) in scientific experimentation is an experimental element which is constant (controlled) and unchanged throughout the course of the investigation. Control variables could strongly influence experimenta ...
", or "fixed variable".
Extraneous variables, if included in a regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
as independent variables, may aid a researcher with accurate response parameter estimation, prediction, and goodness of fit
The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
, but are not of substantive interest to the hypothesis
A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obse ...
under examination. For example, in a study examining the effect of post-secondary education on lifetime earnings, some extraneous variables might be gender, ethnicity, social class, genetics, intelligence, age, and so forth. A variable is extraneous only when it can be assumed (or shown) to influence the dependent variable. If included in a regression, it can improve the fit of the model. If it is excluded from the regression and if it has a non-zero covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...
with one or more of the independent variables of interest, its omission will bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
the regression's result for the effect of that independent variable of interest. This effect is called confounding
In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
or omitted variable bias
In statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to those that were included.
More specifically, OV ...
; in these situations, design changes and/or controlling for a variable statistical control is necessary.
Extraneous variables are often classified into three types:
#Subject variables, which are the characteristics of the individuals being studied that might affect their actions. These variables include age, gender, health status, mood, background, etc.
#Blocking variables or experimental variables are characteristics of the persons conducting the experiment which might influence how a person behaves. Gender, the presence of racial discrimination, language, or other factors may qualify as such variables.
#Situational variables are features of the environment in which the study or research was conducted, which have a bearing on the outcome of the experiment in a negative way. Included are the air temperature, level of activity, lighting, and time of day.
In modelling, variability that is not covered by the independent variable is designated by and is known as the " residual", "side effect", "error
An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'.
In statistics ...
", "unexplained share", "residual variable", "disturbance", or "tolerance".
Examples
* Effect of fertilizer on plant growths:
: In a study measuring the influence of different quantities of fertilizer on plant growth, the independent variable would be the amount of fertilizer used. The dependent variable would be the growth in height or mass of the plant. The controlled variables would be the type of plant, the type of fertilizer, the amount of sunlight the plant gets, the size of the pots, etc.
* Effect of drug dosage on symptom severity:
: In a study of how different doses of a drug affect the severity of symptoms, a researcher could compare the frequency and intensity of symptoms when different doses are administered. Here the independent variable is the dose and the dependent variable is the frequency/intensity of symptoms.
* Effect of temperature on pigmentation:
: In measuring the amount of color removed from beetroot samples at different temperatures, temperature is the independent variable and amount of pigment removed is the dependent variable.
*Effect of sugar added in a coffee:
: The taste varies with the amount of sugar added in the coffee. Here, the sugar is the independent variable, while the taste is the dependent variable.
See also
*Abscissa and ordinate
In common usage, the abscissa refers to the (''x'') coordinate and the ordinate refers to the (''y'') coordinate of a standard two-dimensional graph.
The distance of a point from the y-axis, scaled with the x-axis, is called abscissa or x coo ...
*Blocking (statistics)
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another. Blocking can be used to tackle the problem of pseudoreplication.
Use
Blocking reduces un ...
*Latent variable
In statistics, latent variables (from Latin: present participle of ''lateo'', “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or me ...
versus observable variable
In physics, an observable is a physical quantity that can be measured. Examples include position and momentum. In systems governed by classical mechanics, it is a real-valued "function" on the set of all possible system states. In quantum phys ...
Notes
References
{{Differential equations topics
Design of experiments
Regression analysis
Mathematical terminology
Independence (probability theory)