Skip to main content icon/video/no-internet

Logistic Regression

Logistic regression is a statistical technique used in research designs that call for analyzing the relationship of an outcome or dependent variable to one or more predictors or independent variables when the dependent variable is either (a) dichotomous, having only two categories, for example, whether one uses illicit drugs (no or yes); (b) unordered polytomous, which is a nominal scale variable with three or more categories, for example, political party identification (Democrat, Republican, other, or none); or (c) ordered polytomous, which is an ordinal scale variable with three or more categories, for example, level of education completed (e.g., less than elementary school, elementary school, high school, an undergraduate degree, or a graduate degree). Here, the basic logistic regression model for dichotomous outcomes is examined, noting its extension to polytomous outcomes and its conceptual roots in both loglinear analysis and the general linear model. Next, consideration is given to methods for assessing the goodness of fit and predictive utility of the overall model, and calculation and interpretation of logistic regression coefficients and associated inferential statistics to evaluate the importance of individual predictors in the model. The discussion throughout the entry assumes an interest in prediction, regardless of whether causality is implied; hence, the language of “outcomes” and “predictors” is preferred to the language of “dependent” and “independent” variables.

The equation for the logistic regression model with a dichotomous outcome is

None

where Y is the dichotomous outcome; logit(Y) is the natural logarithm of the odds of Y, a transformation of Y to be discussed in more detail momentarily; and there are k = 1, 2,…, K predictors Xk with associated coefficients βk, plus a constant or intercept α, which represents the value of logit(Y) when all of the Xk are equal to zero. If the two categories of the outcome are coded 1 and 0, respectively, and P1 is the probability of being in the category coded as 1, and P0 is the probability of being in the category coded as 0, then the odds of being in category 1 are

None

(because the probability of being in one category is one minus the probability of being in the other category). Logit(Y) is the natural logarithm of the odds,

None

where ln represents the natural logarithm transformation.

Polytomous Logistic Regression Models

When the outcome is polytomous, logistic regression can be implemented by splitting the outcome into a set of dichotomous variables. This is done by means of contrasts, which identify a reference category (or set of categories) with which to compare each of the other categories (or sets of categories). For a nominal outcome, the most commonly used model is called the baseline category logit model. In this model, the outcome is divided into a set of dummy variables, each representing one of the categories of the outcome, with one of the categories designated as the reference category, in the same way that dummy coding is used for nominal predictors in linear regression. If there are M categories in the outcome, then

None

where P0 is the probability of being in the reference category and Pm is the probability of being in category m = 1, 2,…, M − 1, given that the case is either in category m or in the reference category. A total of M − 1 equations or logit functions are thus estimated, each with its own intercept αm and logistic regression coefficients βk, m, representing the relationship of the predictors to logit(Ym).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading