Skip to main content icon/video/no-internet

Logistic regression is a statistical technique for analyzing the relationship of an outcome or dependent variable to one or more predictors or independent variables when the dependent variable is (1) dichotomous, having only two categories, for example, the presence or absence of symptoms, or the use or nonuse of tobacco (2) unordered polytomous, a nominal scale variable with three or more categories, for example, type of contraception (none, pill, condom, intrauterine device) used in response to services provided by a family planning clinic or (3) ordered polytomous, an ordinal scale variable with three or more categories, for example, whether a patient's condition deteriorates, remains the same, or improves in response to a cancer treatment. Here, the basic logistic regression model for dichotomous outcomes is examined, noting its extension to polytomous outcomes and its conceptual roots in both log-linear analysis and the general linear model. Next, consideration is given to methods for assessing the goodness of fit and predictive utility of the overall model, and calculation and interpretation of logistic regression coefficients and associated inferential statistics to evaluate the importance of individual predictors in the model. Throughout, the discussion assumes an interest in prediction, regardless of whether causality is implied; hence the language of “outcomes” and “predictors” is preferred to the language of “dependent” and “independent” variables.

The equation for the logistic regression model with a dichotomous outcome is logit(Y) = a + b1X1 + b2X2 + + bKXK, where Y is the dichotomous outcome; logit(Y) is the natural logarithm of the odds of Y, a transformation of Y to be discussed in more detail momentarily; and there are k = 1, 2, … K predictors Xk with associated coefficients bk, plusaconstant or intercept, a, which represents the value of logit(Y)whenallthe Xk are equal to 0. If the two categories of the outcome are coded 1 and 0, respectively, and P1 is the probability of being in the category coded as 1, and P0 is the probability of being in the category coded as 0, then the odds of being in Category 1 is P1=P0 = P1=(1 − P1)(sincetheprobability of being in one category is one minus the probability of being in the other category). Logit(Y)isthe natural logarithm of the odds, ln[P1=(1 − P1)], where ln represents the natural logarithm transformation.

Polytomous Logistic Regression Models

When the outcome is polytomous, logistic regression can be implemented by splitting the outcome into a set of dichotomous variables. This is done by means of contrasts that identify a reference category (or set of categories) with which to compare each of the other categories (or sets of categories). For a nominal outcome, the most commonly used model is called the baseline category logit model. In this model, the outcome is divided into a set of dummy variables, each representing one of the categories of the outcome, with one of the categories designated as the reference category, in the same way that dummy coding is used for nominal predictors in linear regression. If there are M categories in the outcome, then logit(Ym) = ln (Pm=P0) = am + b1mX1 + b2mX2 + + bKmXK, where P0 is the probability of being in the reference category and Pm is the probability of being in Category m = 1,2, …, M − 1, given that the case is either in Category m or in the reference category. A total of (M −1) equations or logit functions is thus estimated, each with its own intercept am and logistic regression coefficients bk,m, representing the relationship of the predictors to logit(Ym).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading