Skip to main content icon/video/no-internet

Introduction

Latent class analysis is a statistical tool for classifying objects or individuals according to their values on a set of observed, i.e. manifest, variables. Like cluster analysis, it is aimed at identifying clusters of individuals or objects that are in some sense ‘similar’. In order to separate to the terminology of cluster analysis, the groups of individuals are called ‘classes’ or ‘latent classes’ in latent class analysis (LCA) instead of clusters.

Unlike cluster analysis, the grouping is not done by means of some measure of similarity or distance between each pair of objects to be classified. There is also no need to define some criterion of cluster distance (or similarity), nor to select one of the various cluster algorithms (e.g. agglomerative, centroid method, etc.). In contrast, latent class analysis classifies objects according to their probabilities of the values of all observed variables (feature patterns of the objects). This distinction allows for two significant indications of cluster analysis or latent class analysis, respectively.

First, cluster analysis is to be preferred for small numbers of objects to be classified, LCA for large numbers of objects, say N = 50 or N = 100 be the criterion of applying one or the other. This is because in LCA the probability distributions of all manifest variables have to be parameterized (and estimated) for each latent class (which requires large numbers of observations), whereas in cluster analysis each object has to be measured according to its distance to each other object (which is more tractable for smaller sets of objects).

Second, LCA is better suited for categorical or ordinal data (where each manifest variable has a small number of values, e.g. yes-no responses or rating scale responses to some questionnaire items), whereas cluster analysis is better suited for metric variables (where some distance measure between the objects, like the Euclidean distance, is unproblematic).

But there is a third difference between cluster analysis and LCA that is significant for specifying submodels and extended models: cluster analysis is aimed at identifying a manifest classification, i.e. each object is assigned to one and only one group or cluster (which is also true for some borderline cases, that have the same distance to two or more clusters and may, therefore, be assigned to a single cluster only with high uncertainty). LCA, in contrast, assumes a latent grouping variable, so that each object belongs to each latent class with a certain (assignment) probability. This distinction may be regarded as a more academic distinction, but it has enormous practical implications. The most prominent of them is the possibilty of defining specific statistical models for each latent class. Whereas cluster analysis can only clump together objects that are more or less similar, LCA is capable of identifying classes of objects that can be described by different statistical models. Latent class analysis belongs to the family of (discrete) mixture distribution models, whereas cluster analysis does not. But before going into the details of different variants of LCA, a brief introduction to the assumptions and ideas of LCA is given.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading