Exploratory Causal Analysis
   HOME

TheInfoList



OR:

Causal analysis Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time (that is, causes must occur before their proposed effect ...
is the field of
experimental design The design of experiments (DOE), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. ...
and
statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
pertaining to establishing cause and effect. Exploratory causal analysis (ECA), also known as data causality or causal discovery is the use of statistical
algorithms In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of
causal inference Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference an ...
distinct from
causal model In metaphysics, a causal model (or structural causal model) is a conceptual model that describes the causal mechanisms of a system. Several types of causal notation may be used in the development of a causal model. Causal models can improve stu ...
ing and treatment effects in
randomized controlled trials A randomized controlled trial (or randomized control trial; RCT) is a form of scientific experiment used to control factors not under direct experimental control. Examples of RCTs are clinical trials that compare the effects of drugs, surgical ...
. It is
exploratory research Exploratory research is "the preliminary research to clarify the exact nature of the problem to be solved." It is used to ensure additional research is taken into consideration during an experiment as well as determining research priorities, colle ...
usually preceding more formal
causal research Causal research, is the investigation of (research into) cause-relationships. To determine causality, variation in the variable presumed to influence the difference in another variable(s) must be detected, and then the variations from the other var ...
in the same way
exploratory data analysis In statistics, exploratory data analysis (EDA) is an approach of data analysis, analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or ...
often precedes
statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
in
data analysis Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...


Motivation

Data analysis is primarily concerned with causal questions. For example, did the fertilizer cause the crops to grow? Or, can a given sickness be prevented? Or, why is my friend depressed? The potential outcomes and regression analysis techniques handle such queries when data is collected using designed experiments. Data collected in observational studies require different techniques for causal inference (because, for example, of issues such as
confounding In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlatio ...
). Causal inference techniques used with experimental data require additional assumptions to produce reasonable inferences with observation data. The difficulty of causal inference under such circumstances is often summed up as "
correlation does not imply causation The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The id ...
".


Overview

ECA postulates that there exist data analysis procedures performed on specific subsets of
variables Variable may refer to: Computer science * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed Mathematics * Variable (mathematics), a symbol that represents a quantity in a mathemat ...
within a larger set whose outputs might be indicative of causality between those variables. For example, if we assume every relevant
covariate A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
in the data is observed, then
propensity score matching In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that pred ...
can be used to find the causal effect between two observational variables. Granger causality can also be used to find the causality between two observational variables under different, but similarly strict, assumptions. The two broad approaches to developing such procedures are using ''operational definitions of causality'' or ''verification by "truth"'' (i.e., explicitly ignoring the problem of defining causality and showing that a given algorithm implies a causal relationship in scenarios when causal relationships are known to exist, e.g., using
synthetic data Synthetic data are artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data generated by a comp ...
).


Operational definitions of causality

Clive Granger Sir Clive William John Granger (; 4 September 1934 – 27 May 2009) was a British econometrician known for his contributions to nonlinear time series analysis. He taught in Britain, at the University of Nottingham and in the United States, at t ...
created the first
operational definition An operational definition specifies concrete, replicable procedures designed to represent a construct. In the words of American psychologist S.S. Stevens (1935), "An operation is the performance which we execute in order to make known a concept." F ...
of causality in 1969. Granger made the definition of probabilistic causality proposed by
Norbert Wiener Norbert Wiener (November 26, 1894 – March 18, 1964) was an American computer scientist, mathematician, and philosopher. He became a professor of mathematics at the Massachusetts Institute of Technology ( MIT). A child prodigy, Wiener late ...
operational as a comparison of variances. Some authors prefer using ECA techniques developed using operational definitions of causality because they believe it may help in the search for causal mechanisms.


Verification by "truth"

Peter Spirtes, Clark Glymour, and Richard Scheines introduced the idea of explicitly not providing a definition of causality. Spirtes and Glymour introduced the PC algorithm for causal discovery in 1990. Many recent causal discovery algorithms follow the Spirtes-Glymour approach to verification.


Techniques

There are many surveys of causal discovery techniques. This section lists the well-known techniques.


Bivariate (or "pairwise")

* Granger causality (there is also the
Scholarpedia ''Scholarpedia'' is an English-language wiki-based online encyclopedia with features commonly associated with Open access (publishing), open-access online academic journals, which aims to have quality content in science and medicine. ''Scholarpe ...
entr

* transfer entropy * convergent cross mapping


Multivariate

* causation entropy * PC algorithm * FCI algorithm * LiNGA

Many of these techniques are discussed in the tutorials provided by the Center for Causal Discovery (CCD


Use-case examples


Social science

The PC algorithm has been applied to several different
social science Social science (often rendered in the plural as the social sciences) is one of the branches of science, devoted to the study of societies and the relationships among members within those societies. The term was formerly used to refer to the ...
data sets.


Medicine

The PC algorithm has been applied to medical data. Granger causality has been applied to
fMRI Functional magnetic resonance imaging or functional MRI (fMRI) measures brain activity by detecting changes associated with blood flow. This technique relies on the fact that cerebral blood flow and neuronal activation are coupled. When an area o ...
data. CCD tested their tools using biomedical dat


Physics

ECA is used in physics to understand the physical causal mechanisms of the system, e.g., in geophysics using the PC-stable algorithm (a variant of the original PC algorithm) and in dynamical systems using pairwise asymmetric inference (a variant of convergent cross mapping).


Criticism

There is debate over whether or not the relationships between data found using causal discovery are actually causal.
Judea Pearl Judea Pearl (; born September 4, 1936) is an Israeli-American computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belie ...
has emphasized that causal inference requires a causal model developed by "intelligence" through an iterative process of testing assumptions and fitting data. Response to the criticism points out that assumptions used for developing ECA techniques may not hold for a given data set and that any causal relationships discovered during ECA are contingent on these assumptions holding true


Software Packages


Comprehensive toolkits


Tetrad
is an open source GUI-based Java program that provides a collection of causal discovery algorithms. The algorithm library used by Tetrad is also available as a
command-line A command-line interface (CLI) is a means of interacting with software via commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternativ ...
tool,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
, and R wrapper.
Java Information Dynamics Toolkit (JIDT)
is an open source Java library for performing information-theoretic causal discovery (i.e., transfer entropy, conditional transfer entropy, etc

Examples of using the library in
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
,
GNU Octave GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
, R,
Julia Julia may refer to: People *Julia (given name), including a list of people with the name *Julia (surname), including a list of people with the name *Julia gens, a patrician family of Ancient Rome *Julia (clairvoyant) (fl. 1689), lady's maid of Qu ...
and
Clojure Clojure (, like ''closure'') is a dynamic programming language, dynamic and functional programming, functional dialect (computing), dialect of the programming language Lisp (programming language), Lisp on the Java (software platform), Java platfo ...
are provided in the documentatio


pcalg
is an R package that provides some of the same causal discovery algorithms provided in Tetra

.


Specific Techniques


Granger causality

* R packag

*
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
packag

h3>

convergent cross mapping

* R packag


LiNGAM

*
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
/
GNU Octave GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...
packag

There is also a collection of tools and data maintained by the Causality Workbench tea

and the CCD tea


References

{{DEFAULTSORT:Exploratory Causal Analysis Exploratory data analysis, *