Skip to main content icon/video/no-internet

A categorical variable consists of a set of non-overlapping categories. Categorical data are counts for those categories. The measurement scale of a categorical variable is ordinal if the categories exhibit a natural ordering, such as opinion variables with categories from “strongly disagree” to “strongly agree.” The measurement scale is nominal if there is no inherent ordering. The types of possible analysis for categorical data depend on the measurement scale.

Types of Analysis

When the subjects measured are cross-classified on two or more categorical variables, the table of counts for the various combinations of categories is a contingency table. The information in a contingency table can be summarized and further analyzed through appropriate measures of association and models as discussed below. These measures and models differentiate according to the nature of the classification variables (nominal or ordinal).

Most studies distinguish between one or more response variables and a set of explanatory variables. When the main focus is on the association and interaction structure among a set of response variables, such as whether two variables are conditionally independent given values for the other variables, loglinear models are useful, as described in a later section. More commonly, research questions focus on effects of explanatory variables on a categorical response variable. Those explanatory variables might be categorical, quantitative, or of both types. Logistic regression models are then of particular interest. Initially such models were developed for binary (success-failure) response variables. They describe the logit, which is log[P(Y = 1)/P(Y = 2)], using the equation

None

where Y is the binary response variable and x1,…, Xp the set of the explanatory variables. The logistic regression model was later extended to nominal and ordinal response variables. For a nominal response Y with / categories, the model simultaneously describes

None

For ordinal responses, a popular model uses explanatory variables to predict a logit defined in terms of a cumulative probability,

None

For categorical data, the binomial and multinomial distributions play the central role that the normal does for quantitative data. Models for categorical data assuming the binomial or multinomial were unified with standard regression and analysis of variance (ANOVA) models for quantitative data assuming normality were unified through the introduction of the generalized linear model (GLM). This very wide class of models can incorporate data assumed to come from any of a variety of standard distributions (such as the normal, binomial, and Poisson). The GLM relates a function of the mean (such as the log or logit of the mean) to explanatory variables with a linear predictor. Certain GLMs for counts, such as Poisson regression models, relate naturally to log linear and logistic models for binomial and multinomial responses.

More recently, methods for categorical data have been extended to include clustered data, for which observations within each cluster are allowed to be correlated. A very important special case is that of repeated measurements, such as in a longitudinal study in which each subject provides a cluster of observations taken at different times. One way this is done is to introduce a random effect in the model to represent each cluster, thus extending the GLM to a generalized linear mixed model, the mixed referring to the model's containing both random effects and the usual sorts of fixed effects.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading