Applications
Logistic regression is used in various fields, including machine learning, most medical fields, and social sciences. For example, the Trauma and Injury Severity Score ( TRISS), which is widely used to predict mortality in injured patients, was originally developed by Boyd ' using logistic regression. Many other medical scales used to assess severity of a patient have been developed using logistic regression. Logistic regression may be used to predict the risk of developing a given disease (e.g.Example
Problem
A group of 20 students spends between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability of the student passing the exam?The reason for using logistic regression for this problem is that the values of the dependent variable, pass and fail, while represented by "1" and "0", are not
Model
Fit
The usual measure ofParameter estimation
Since ''ℓ'' is nonlinear in and , determining their optimum values will require numerical methods. Note that one method of maximizing ''ℓ'' is to require the derivatives of ''ℓ'' with respect to and to be zero: : : and the maximization procedure can be accomplished by solving the above two equations for and , which, again, will generally require the use of numerical methods. The values of and which maximize ''ℓ'' and ''L'' using the above data are found to be: : : which yields a value for ''μ'' and ''s'' of: : :Predictions
The and coefficients may be entered into the logistic regression equation to estimate the probability of passing the exam. For example, for a student who studies 2 hours, entering the value into the equation gives the estimated probability of passing the exam of 0.25: : : Similarly, for a student who studies 4 hours, the estimated probability of passing the exam is 0.87: : : This table shows the estimated probability of passing the exam for several values of hours studying.Model evaluation
The logistic regression analysis gives the following output. By the Wald test, the output indicates that hours studying is significantly associated with the probability of passing the exam (). Rather than the Wald method, the recommended method to calculate the ''p''-value for logistic regression is the likelihood-ratio test (LRT), which for this data gives (see below).Generalizations
This simple model is an example of binary logistic regression, and has one explanatory variable and a binary categorical variable which can assume one of two categorical values.Background
Definition of the logistic function
An explanation of logistic regression can begin with an explanation of the standardDefinition of the inverse of the logistic function
We can now define the logit (log odds) function as the inverse of the standard logistic function. It is easy to see that it satisfies: : and equivalently, after exponentiating both sides we have the odds: :Interpretation of these terms
In the above equations, the terms are as follows: * is the logit function. The equation for illustrates that the logit (i.e., log-odds or natural logarithm of the odds) is equivalent to the linear regression expression. * denotes theDefinition of the odds
The odds of the dependent variable equaling a case (given some linear combination of the predictors) is equivalent to the exponential function of the linear regression expression. This illustrates how the logit serves as a link function between the probability and the linear regression expression. Given that the logit ranges between negative and positive infinity, it provides an adequate criterion upon which to conduct linear regression and the logit is easily converted back into the odds. So we define odds of the dependent variable equaling a case (given some linear combination of the predictors) as follows: :The odds ratio
For a continuous independent variable the odds ratio can be defined as: :Multiple explanatory variables
If there are multiple explanatory variables, the above expression can be revised to . Then when this is used in the equation relating the log odds of a success to the values of the predictors, the linear regression will be a multiple regression with ''m'' explanators; the parameters for all are all estimated. Again, the more traditional equations are: : and : where usually .Definition
The basic setup of logistic regression is as follows. We are given a dataset containing ''N'' points. Each point ''i'' consists of a set of ''m'' input variables ''x''1,''i'' ... ''x''''m,i'' (also called independent variables, explanatory variables, predictor variables, features, or attributes), and a binary outcome variable ''Y''''i'' (also known as aMany explanatory variables, two categories
The above example of binary logistic regression on one explanatory variable can be generalized to binary logistic regression on any number of explanatory variables ''x1, x2,...'' and any number of categorical values . To begin with, we may consider a logistic model with ''M'' explanatory variables, ''x1'', ''x2'' ... ''xM'' and, as in the example above, two categorical values (''y'' = 0 and 1). For the simple binary logistic regression model, we assumed a linear relationship between the predictor variable and the log-odds (also called logit) of the event that . This linear relationship may be extended to the case of ''M'' explanatory variables: : where ''t'' is the log-odds and are parameters of the model. An additional generalization has been introduced in which the base of the model (''b'') is not restricted to the Euler number ''e''. In most applications, the base of the logarithm is usually taken to be '' e''. However, in some cases it can be easier to communicate results by working in base 2 or base 10. For a more compact notation, we will specify the explanatory variables and the ''β'' coefficients as -dimensional vectors: : : with an added explanatory variable ''x0'' =1. The logit may now be written as: : Solving for the probability ''p'' that yields: :, where is the sigmoid function with base . The above formula shows that once the are fixed, we can easily compute either the log-odds that for a given observation, or the probability that for a given observation. The main use-case of a logistic model is to be given an observation x, and estimate the probability ''p(x)'' that . The optimum beta coefficients may again be found by maximizing the log-likelihood. For ''K'' measurements, defining xk as the explanatory vector of the ''k''-th measurement, and ''y''k as the categorical outcome of that measurement, the log likelihood may be written in a form very similar to the simple case above: : As in the simple example above, finding the optimum ''β'' parameters will require numerical methods. One useful technique is to equate the derivatives of the log likelihood with respect to each of the ''β'' parameters to zero yielding a set of equations which will hold at the maximum of the log likelihood: : where ''xmk'' is the value of the ''xm'' explanatory variable from the ''k-th'' measurement. Consider an example with explanatory variables, , and coefficients , , and which have been determined by the above method. To be concrete, the model is: : :, where ''p'' is the probability of the event that . This can be interpreted as follows: * is the ''y''-intercept. It is the log-odds of the event that , when the predictors . By exponentiating, we can see that when the odds of the event that are 1-to-1000, or . Similarly, the probability of the event that when can be computed as * means that increasing by 1 increases the log-odds by . So if increases by 1, the odds that increase by a factor of . Note that the probability of has also increased, but it has not increased by as much as the odds have increased. * means that increasing by 1 increases the log-odds by . So if increases by 1, the odds that increase by a factor of Note how the effect of on the log-odds is twice as great as the effect of , but the effect on the odds is 10 times greater. But the effect on the probability of is not as much as 10 times greater, it's only the effect on the odds that is 10 times greater.Multinomial logistic regression: Many explanatory variables and many categories
In the above cases of two categories (binomial logistic regression), the categories were indexed by "0" and "1", and we had two probability distributions: The probability that the outcome was in category 1 was given by and the probability that the outcome was in category 0 was given by . The sum of both probabilities is equal to unity, as they must be. In general, if we have explanatory variables (including ''x0'') and categories, we will need separate probability distributions, one for each category, indexed by ''n'', which describe the probability that the categorical outcome ''y'' for explanatory vector x will be in category ''y=n''. It will also be required that the sum of these probabilities over all categories be equal to unity. Using the mathematically convenient base ''e'', these probabilities are: : for : Each of the probabilities except will have their own set of regression coefficients . It can be seen that, as required, the sum of the over all categories is unity. Note that the selection of to be defined in terms of the other probabilities is artificial. Any of the probabilities could have been selected to be so defined. This special value of ''n'' is termed the "pivot index", and the log-odds (''tn'') are expressed in terms of the pivot probability and are again expressed as a linear combination of the explanatory variables: : Note also that for the simple case of , the two-category case is recovered, with and . The log-likelihood that a particular set of ''K'' measurements or data points will be generated by the above probabilities can now be calculated. Indexing each measurement by ''k'', let the ''k''-th set of measured explanatory variables be denoted by and their categorical outcomes be denoted by which can be equal to any integer in ,N The log-likelihood is then: : where is anInterpretations
There are various equivalent specifications and interpretations of logistic regression, which fit into different types of more general models, and allow different generalizations.As a generalized linear model
The particular model used by logistic regression, which distinguishes it from standardAs a latent-variable model
The logistic model has an equivalent formulation as a latent-variable model. This formulation is common in the theory ofTwo-way latent-variable model
Yet another formulation uses two separate latent variables: : where : where ''EV''1(0,1) is a standard type-1 extreme value distribution: i.e. : Then : This model has a separate latent variable and a separate set of regression coefficients for each possible outcome of the dependent variable. The reason for this separation is that it makes it easy to extend logistic regression to multi-outcome categorical variables, as in the multinomial logit model. In such a model, it is natural to model each possible outcome using a different set of regression coefficients. It is also possible to motivate each of the separate latent variables as the theoreticalExample
: As an example, consider a province-level election where the choice is between a right-of-center party, a left-of-center party, and a secessionist party (e.g. theAs a "log-linear" model
Yet another formulation combines the two-way latent variable formulation above with the original formulation higher up without latent variables, and in the process provides a link to one of the standard formulations of the multinomial logit. Here, instead of writing the logit of the probabilities ''p''''i'' as a linear predictor, we separate the linear predictor into two, one for each of the two outcomes: : Two separate sets of regression coefficients have been introduced, just as in the two-way latent variable model, and the two equations appear a form that writes theAs a single-layer perceptron
The model has an equivalent formulation : This functional form is commonly called a single-layer perceptron or single-layerIn terms of binomial data
A closely related model assumes that each ''i'' is associated not with a single Bernoulli trial but with ''n''''i'' independent identically distributed trials, where the observation ''Y''''i'' is the number of successes observed (the sum of the individual Bernoulli-distributed random variables), and hence follows aModel fitting
Maximum likelihood estimation (MLE)
The regression coefficients are usually estimated using maximum likelihood estimation. Unlike linear regression with normally distributed residuals, it is not possible to find a closed-form expression for the coefficient values that maximize the likelihood function, so that an iterative process must be used instead; for exampleIteratively reweighted least squares (IRLS)
Binary logistic regression ( or ) can, for example, be calculated using ''iteratively reweighted least squares'' (IRLS), which is equivalent to maximizing the log-likelihood of a Bernoulli distributed process using