Skip to main content icon/video/no-internet

In broad terms, clustering, or cluster analysis, refers to the process of organizing objects into groups whose members are similar with respect to a similarity or distance criterion. As such, a cluster is a collection of similar objects that are distant from the objects of other clusters. Unlike most classification techniques that aim to assign new observations to one of the many existing groups, clustering is an exploratory procedure that attempts to group objects based on their similarities or distances without relying on any assumptions regarding the number of groups.

Applications of clustering are many; consequently, different techniques have been developed to address the varying analytical objectives. There are applications (such as market research) in which clustering can be used to group objects (customers) based on their behaviors (purchasing patterns). In other applications (such as biology), clustering can be used to classify objects (plants) based on their characteristics (features).

Depending on the application and the nature of data at hand, three general types of data are typically used in clustering. First, data can be displayed in the form of an O × C matrix, where C characteristics are observed on O objects. Second, data can be in the form of an N × N similarity or distance matrix, where each entry represents a measure of similarity or distance between the two corresponding objects. Third, data might represent presumed group membership of objects where different observers may place an object in the same or different groups. Regardless of data type, the aim of clustering is to partition the objects into G groups where the structure and number of the resulting natural clusters will be determined empirically. Oftentimes, the input data are converted into a similarity matrix before objects are portioned into groups according to one of the many clustering algorithms.

It is usually impossible to construct and evaluate all clustering possibilities of a given set of objects, since there are many different ways of measuring similarity or dissimilarly among a set of objects. Moreover, similarity and dissimilarly measures can be univariate or multivariate in nature, depending on whether one or more characteristics of the objects in question are included in calculations. As such, it is impractical to talk about an optimal clustering technique; however, there are two classes of techniques (hierarchical and nonhier-archical) that are often used in practice for clustering.

Hierarchical techniques proceed in a sequential fashion, producing an increasing or decreasing number of nested arrangements of objects. Such techniques can be agglomerative, whereby individual objects start as single clusters and thereafter similar clusters are merged to form progressively fewer larger clusters. As the number of clusters decreases, so do their similarities, eventually leading to the single most dissimilar cluster that includes all objects. In contrast, hierarchical techniques can be divisive, whereby a single cluster of all objects is first partitioned into two clusters of similar objects and thereafter the resulting clusters are further portioned into two new similar clusters. As the number of clusters increases, so do their similarities, eventually leading to the set of most similar clusters that consists of one object per cluster. With hierarchical techniques, the criterion for merging or partitioning interim clusters can be based on the distance (linkage) between their nearest objects, furthest objects, average distance among all objects, or more sophisticated distance measures such as those based on Ward's or Centroid methods. The results of both agglomerative and divisive clustering techniques are often displayed via a two-dimensional graph (tree) called a “dendogram.”

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading