Skip to main content icon/video/no-internet

A statistic that quantifies the extent to which population units within clusters are similar to one another (i.e. the degree of homogeneity within clusters) is called the intraclass correlation coefficient (ICC) and is often denoted by the Greek letter rho (ρ).

When population units are grouped or clustered into larger units, which are themselves easier to identify and sample (i.e. children grouped into classrooms or elderly citizens grouped into nursing homes), one-or two-stage cluster sampling becomes an appealing, cost-effective, and practical choice for a sampling strategy. These benefits are often counterbalanced by the usual expected loss in efficiency and precision in estimates derived from cluster samples that is, in large part, due to the fact that units within clusters tend to be more similar to each other compared to units in the general population for many outcomes of interest.

The computation of ρ essentially provides a rate of homogeneity for elements within a cluster relative to the overall population variance, as seen by the following equation:

None

where C is the number of clusters in the population, M is the number of elements within each cluster, yij is the measurement for the jth element in the ith cluster, yik is the measurement for the jth element in cluster None is the population mean for the ith cluster, and S2 is the finite population variance defined by

None

where None is the population mean.

Note that Equation 1 is equivalent to a simpler formula containing values easily obtained from an ANOVA table, accounting for clustering as follows:

None

where None is the sum of squares within clusters and None is the total sum of squares about None.

From Equation 3 and the fact that 0 < SSW / SST<1, it follows that None. If there is complete duplication within each cluster, then the ICC takes on the highest possible value of 1 to indicate complete homogeneity within clusters; on the other hand, if the heterogeneity within clusters is consistent with that of the overall population, then the ICC will assume its smallest value of − 1/(M-1). Cluster sampling will be more efficient than simple random sampling with the same overall sample size whenever −1/M-l<ρ<0 and less efficient when the ICC values are positive and closer to 1.

Consider the following example to illustrate the computation of the ICC. Researchers are interested in determining the average fruit and vegetable intake of staff members of a nationally franchised health club in preparation for a campaign to promote exercise and diet among its members. The population consists of five franchised health clubs that each have eight staff members. The fruit and vegetable intake for each population member is provided in Table 1. In this scenario the number of clusters is five (C = 5), and the number of elements per cluster is eight (M = 8).

From the SSW and SST obtained with these results the ICC is computed using Equation 3 as follows:

None

This large, positive ICC indicates that, on average, the staff members within each cluster tend to consume similar amounts of fruits and vegetables per day. In other words, the clusters are extremely homogeneous. The homogeneity within clusters also means that a one-stage cluster sample would be less efficient than a simple random sample of the same size (i.e. the design effect [deff] would be greater than 1).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading