Skip to main content icon/video/no-internet

Pairwise Comparisons

Pairwise comparisons are methods for analyzing multiple population means in pairs to determine whether they are significantly different from one another. This entry explores the concept of pairwise comparisons, various approaches, and key considerations when performing such comparisons.

Concept

Because population parameters (e.g., population mean) are unknown, practitioners collect samples for the purpose of making statistical inference regarding those parameters. As an example, many different statistical methods have been developed for determining if there exists a difference between population means. Perhaps most well-known is Student's t test, which is typically used for inferring from two samples if there exists a difference between the corresponding population means (or, in the case of only one sample, determining if the population mean differs from some fixed constant). As an example, the researcher might be interested in performing a hypothesis test to determine if a particular reading improvement program among children is more effective than the traditional approach in a particular school district. The researcher might draw a sample of children to act as the control group and a similar sample to act as the treatment group. Gauging the effectiveness of the intervention by some measurement tool (e.g., words read per minute or reading comprehension), the researcher might implement a t test to determine if there exists a significant difference between the control and the treatment group means.

In the case, however, that there are more than two means to be compared, the t test becomes less useful. Consider a similar situation in which the researcher wishes to determine if there exists a difference between the effectiveness of several intervention programs. For example, the researcher might want to compare the effectiveness of several different programs that are implanted in each of, say, five school districts in a particular city. The researcher could, of course, simply perform multiple t tests, comparing each of the intervention programs with all of the others. There are at least two critical problems with this approach.

The first is that the number of t tests that are required to be performed dramatically increases as the number of treatment groups increases. In the case of five treatment levels and a single factor, there are only

None
different t tests that would need to be performed. In the case, though, that there are multiple factors with multiple treatment levels, this number quickly becomes large. With the increase in computing power, however, this obstacle is not as meaningful as it has been in the past.

The second difficulty is more substantive. The second difficulty with performing multiple t tests to compare several means is that it greatly increases the false alarm rate (i.e., the probability of making a Type I error, which is rejecting the null hypothesis when it is, in fact, true). In the above example, suppose the level of significance is α = .05, then the probability of making a Type I error is .05, and the probability of not making a Type I error is .95. Performing 10 t tests, however, each with α = .05, causes the probability of committing at least one Type I error to increase to 1 − .9510 = 0.4013. That is, simply using t tests to determine if there exists a difference between any of five population means, each with significance .05, results in around a 40% chance of committing a Type I error.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading