Skip to main content icon/video/no-internet

The basic idea underlying latent class (LC) analysis is a very simple one: Some of the parameters of a postulated statistical model differ across un observed subgroups. These subgroups form the categories of a categorical latent variable (see LATENT VARIABLE). This basic idea has several seemingly unrelated applications, the most important of which are clustering, scaling, density estimation, and random-effects modeling. Outside social sciences, LC models are often referred to as finite mixture models.

LC analysis was introduced in 1950 by Lazarsfeld, who used the technique as a tool for building typologies (or clustering) based on dichotomous observed variables. More than 20 years later, Goodman (1974) made the model applicable in practice by developing an algorithm for obtaining maximum likelihood estimates of the model parameters. He also proposed extensions for polytomous manifest variables and multiple latent variables and did important work on the issue of model identification. During the same period, Haberman (1979) showed the connection between LC models and log-linear models for frequency tables with missing (unknown) cell counts. Many important extensions of the classical LC model have been proposed since then, such as models containing (continuous) covariates, local dependencies, ordinal variables, several latent variables, and repeated measures. A general framework for categorical data analysis with discrete latent variables was proposed by Hagenaars (1990) and extended by Vermunt (1997).

Although in the social sciences, LC and finite mixture models are conceived primarily as tools for categorical data analysis, they can be useful in several other areas as well. One of these is density estimation, in which one makes use of the fact that a complicated density can be approximated as a finite mixture of simpler densities. LC analysis can also be used as a probabilistic cluster analysis tool for continuous observed variables, an approach that offers many advantages over traditional cluster techniques such as K-means clustering (see LATENT PROFILE MODEL). Another application area is dealing with unobserved heterogeneity, for example, in regression analysis with dependent observations (see NONPARAMETRIC RANDOM-EFFECTS MODEL).

The Classical LC Model for Categorical Indicators

Let X represent the latent variable and Yl one of the L observed or manifest variables, where 1 ≤lL. Moreover, let C be the number of latent classes and Dl the number of levels of Yl. A particular LC is enumerated by the index x, x = 1, 2, …,C, and a particular value of Yl by yl, yl = 1, 2, …,Dl. The vector notation Y and y is used to refer to a complete response pattern.

To make things more concrete, let us consider the following small data set obtained from the 1987 General Social Survey:

Y1Y2Y3FrequencyP(X = 1|Y = y)P(X = 2|Y = y)
111696.998.002
11268.929.071
121275.876.124
122130.168.832
21134.848.152
21219.138.862
221125.080.920
222366.002.998

The three dichotomous indicators Y1, Y2, and Y3 are the responses to the statements “allow anti-religionists to speak” (1 = allowed, 2 = not allowed), “allow antireligionists to teach” (1 = allowed, 2 = not allowed), and “remove anti-religious books from the library” (1 = do not remove, 2 = remove). By means of LC analysis, it is possible to identify subgroups with different degrees of tolerance toward anti-religionists.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading