Conditional logistic regression is an extension of
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
that allows one to take into account
stratification
Stratification may refer to:
Mathematics
* Stratification (mathematics), any consistent assignment of numbers to predicate symbols
* Data stratification in statistics
Earth sciences
* Stable and unstable stratification
* Stratification, or st ...
and
matching. Its main field of application is
observational studies
In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample (statistics), sample to a statistical population, population where the dependent and independent variables, independ ...
and in particular
epidemiology
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population.
It is a cornerstone of public health, and shapes policy decisions and evidenc ...
. It was devised in 1978 by
Norman Breslow
Norman Edward Breslow (February 21, 1941 – December 9, 2015) was an American statistician and medical researcher. At the time of his death, he was Professor (Emeritus) of Biostatistics in the School of Public Health, of the University of Washi ...
,
Nicholas Day,
Katherine Halvorsen
Katherine Taylor Halvorsen is an American statistician and statistics educator whose research topics have included statistical significance for contingency tables, and the conditional logistic regression method for analysis of multiple risk fac ...
,
Ross L. Prentice and C. Sabai.
[ ] It is the most flexible and general procedure for matched data.
Motivation
Observational studies use
stratification
Stratification may refer to:
Mathematics
* Stratification (mathematics), any consistent assignment of numbers to predicate symbols
* Data stratification in statistics
Earth sciences
* Stable and unstable stratification
* Stratification, or st ...
or
matching as a way to control for
confounding
In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
. Several tests existed before conditional logistic regression for matched data as shown in
related tests
''Related'' is an American comedy-drama television series that aired on The WB from October 5, 2005, to March 20, 2006. It revolves around the lives of four close-knit sisters of Italian descent, raised in Brooklyn and living in Manhattan.
The ...
. However, they did not allow for the analysis of continuous predictors with arbitrary stratum size. All of those procedures also lack the flexibility of conditional logistic regression and in particular the possibility to control for covariates.
Logistic regression can take into account stratification by having a different constant term for each stratum. Let us denote
the label (e.g. case status) of the
th observation of the
th stratum and
the values of the corresponding predictors. Then, the likelihood of one observation is
:
where
is the constant term for the
th stratum. While this works satisfactorily for a limited number of strata, pathological behavior occurs when the strata are small. When the strata are pairs, the number of parameters grows with the number of observations
(it equals
). The asymptotic results on which
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
is based on are therefore not valid and the estimation is biased. In fact, it can be shown that the unconditional analysis of matched pair data results in an estimate of the odds ratio which is the square of the correct, conditional one.
Conditional likelihood
The conditional likelihood approach deals with the above pathological behavior by conditioning on the number of cases in each stratum and therefore eliminating the need to estimate the strata parameters. In the case where the strata are pairs, where the first observation is a case and the second is a control, this can be seen as follows
:
With similar computations, the conditional likelihood of a stratum of size
, with the
first observations being the cases, is
: