HOME

TheInfoList



OR:

Conditional logistic regression is an extension of
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
that allows one to take into account
stratification Stratification may refer to: Mathematics * Stratification (mathematics), any consistent assignment of numbers to predicate symbols * Data stratification in statistics Earth sciences * Stable and unstable stratification * Stratification, or st ...
and matching. Its main field of application is
observational studies In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample (statistics), sample to a statistical population, population where the dependent and independent variables, independ ...
and in particular
epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evidenc ...
. It was devised in 1978 by
Norman Breslow Norman Edward Breslow (February 21, 1941 – December 9, 2015) was an American statistician and medical researcher. At the time of his death, he was Professor (Emeritus) of Biostatistics in the School of Public Health, of the University of Washi ...
, Nicholas Day,
Katherine Halvorsen Katherine Taylor Halvorsen is an American statistician and statistics educator whose research topics have included statistical significance for contingency tables, and the conditional logistic regression method for analysis of multiple risk fac ...
, Ross L. Prentice and C. Sabai. It is the most flexible and general procedure for matched data.


Motivation

Observational studies use
stratification Stratification may refer to: Mathematics * Stratification (mathematics), any consistent assignment of numbers to predicate symbols * Data stratification in statistics Earth sciences * Stable and unstable stratification * Stratification, or st ...
or matching as a way to control for
confounding In statistics, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Con ...
. Several tests existed before conditional logistic regression for matched data as shown in
related tests ''Related'' is an American comedy-drama television series that aired on The WB from October 5, 2005, to March 20, 2006. It revolves around the lives of four close-knit sisters of Italian descent, raised in Brooklyn and living in Manhattan. The ...
. However, they did not allow for the analysis of continuous predictors with arbitrary stratum size. All of those procedures also lack the flexibility of conditional logistic regression and in particular the possibility to control for covariates. Logistic regression can take into account stratification by having a different constant term for each stratum. Let us denote Y_\in\ the label (e.g. case status) of the \ellth observation of the ith stratum and X_\in\mathbb^p the values of the corresponding predictors. Then, the likelihood of one observation is : \mathbb(Y_=1, X_)=\frac where \alpha_i is the constant term for the ith stratum. While this works satisfactorily for a limited number of strata, pathological behavior occurs when the strata are small. When the strata are pairs, the number of parameters grows with the number of observations N (it equals \frac+p). The asymptotic results on which
maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
is based on are therefore not valid and the estimation is biased. In fact, it can be shown that the unconditional analysis of matched pair data results in an estimate of the odds ratio which is the square of the correct, conditional one.


Conditional likelihood

The conditional likelihood approach deals with the above pathological behavior by conditioning on the number of cases in each stratum and therefore eliminating the need to estimate the strata parameters. In the case where the strata are pairs, where the first observation is a case and the second is a control, this can be seen as follows : \begin & \mathbb(Y_=1,Y_=0, X_,X_,Y_+Y_=1) \\ & =\frac\\ pt\ & =\frac\\ pt\ & =\frac. \\ pt\end With similar computations, the conditional likelihood of a stratum of size m, with the k first observations being the cases, is : \mathbb(Y_=1\textj\leq k,Y_=0\text k where \mathcal _^ is the set of all subsets of size k of the set \. The full conditional log likelihood is then simply the sum of the log likelihoods for each stratum. The estimator is then defined as the \beta that maximizes the conditional log likelihood.


Implementation

Conditional logistic regression is available in R as the function clogit in the survival package. It is in the survival package because the log likelihood of a conditional logistic model is the same as the log likelihood of a Cox model with a particular data structure.


Related tests

*
Paired difference test In statistics, a paired difference test is a type of location test that is used when comparing two sets of measurements to assess whether their expected value, population means differ. A paired difference test uses additional information about the ...
allows to test the association between a binary outcome and a continuous predictor while taking into account pairing. * Cochran-Mantel-Haenszel test allows to test the association between a binary outcome and a binary predictor while taking into account stratification with arbitrary strata size. When its conditions of application are verified, it is identical to the conditional logistic regression
score test In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the ''score''—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the ...
.


Notes

{{reflist Logistic regression