Skip to main content icon/video/no-internet

The issue of multiple comparisons has created considerable controversy within epidemiology. The fundamental questions are which procedure to use and whether probabilities associated with multiple tests should be adjusted to control Type I errors. The latter topic appears the most contentious.

It is helpful to make a distinction between multiple comparisons, which usually involve comparison of multiple groups or treatment arms on one dependent variable, and multiple testing, which usually involves the comparison of two (or more) groups on multiple dependent variables. Although both procedures raise many of the same questions, they differ in some important ways. Multiple comparison procedures are more formalized than those for multiple testing.

A Type I error, also referred to as alpha or by the Greek letter a, refers to the probability that a statistical test will incorrectly reject a true null hypothesis (H0). In most cases, a is set at .05 or .01. When we test multiple differences (whether between groups or for different dependent variables), we can talk about Type I errors in two different ways. The per comparison error rate is the probability of an error on each of our comparisons, taken separately. Alternatively, the experiment-wise or family-wise error rate is the probability of making at least one Type I error in a whole set, or family, of comparisons. An important argument in epidemiology is which of these error rates is the appropriate one.

Multiple Comparisons

One approach to making multiple comparisons is to define a set of linear contrasts that focus specifically on important questions of interest. Normally, these questions are defined before the start of an experiment and relate directly to its purpose. For a clinical trial with two control groups and several treatment groups, we might, for example, create a contrast of the mean of the control groups versus the mean of the combined treatment groups. Or we might ask whether the mean of the most invasive medical procedure is significantly different from the mean of the least invasive procedure. We usually test only a few contrasts, both to control the family-wise error rate and because other potential contrasts are not central to our analysis. Generally, though not always, researchers will use some variant of a Bonferroni procedure (described below) to control the family-wise error rate over the several contrasts.

An alternative approach to multiple comparisons is to use a procedure such as the Tukey HSD (“honestly significant difference”) test. (There are many such tests, but the Tukey is a convenient stand-in for the others. These tests are discussed in any text on statistical methods and can be computed by most software programs.) The Tukey is a range test and is based on the Studentized range statistic. It modifies the critical value of the test statistic depending on the number of levels of the dependent variable. Other tests differ in how that critical value is determined, often in a sequential manner. The Tukey procedure performs all possible comparisons between pairs of groups and places the groups into homogeneous subsets based on some characteristic, in this example their mean. For instance, in an experiment with six groups there might be three homogeneous subsets, such that within each subset the mean values of each group do not differ significantly from each other. These homogeneous subsets may overlap; for instance, m1 = m2 = m3; m3 = m4 = m5; m5 = m6. The presence of overlapping sets is often confusing to users, but is inherent in the test. In addition, detection of homogeneous, but overlapping, subsets is seldom the goal of a statistical analysis, and it may be difficult to use this information. A third difficulty is posed by multiple outcomes; in fact, most researchers using multiple comparison procedures either do not measure more than one dependent variable, or they consider different dependent variables to be distinct and treat them separately. The final way of making comparisons among groups is to use some sort of Bonferroni correction. The Bonferroni inequality states that the probability of the occurrence of one or more events cannot exceed the sum of their individual probabilities. If the probability of a Type I error for one contrast is a, and we create k contrasts, the probability of at least one Type I error cannot exceed ka. So if we run each contrast at a0 = a=k, then the family-wise error rate will never exceed a. The Bonferroni procedure and the sequential tests based on it are widely applicable.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading