Skip to main content icon/video/no-internet

Robust statistics represent an alternative approach to parameter estimation, differing from nonrobust statistics (sometimes called classical statistics) in the degree to which they are affected by violations of model assumptions. Whereas nonrobust statistics are greatly affected by small violations of their underlying assumptions, robust statistics are only slightly affected by such violations. Statisticians have focused primarily on designing statistics that are robust to violations of normality, due to both the frequency of nonnormality (e.g., via outliers) and its unwanted impact on commonly used statistics that assume normality (e.g., standard error of the mean). Nevertheless, robust statistics also exist that minimize the impact of violations other than nonnormality (e.g., heteroscedasticity).

The Deleterious Effects of Relaxed Assumptions

In evaluating the robustness of any inferential statistic, one should consider both efficiency and bias. Efficiency, closely related to the concept of statistical power and Type II error (not rejecting a false null hypothesis), refers to the stability of a statistic over repeated sampling (i.e., the spread of its sampling distribution). Bias, closely related to the concept of Type I error (rejecting a true null hypothesis), refers to the accuracy of a statistic over repeated sampling (i.e., the difference between the mean of its sampling distribution and the estimated population parameter). When the distributional assumptions that underlie parametric statistics are met, robust statistics are designed to differ minimally from nonrobust statistics in either their efficacy or bias. When these assumptions are relaxed, however, robust statistics are designed to outperform nonrobust statistic in their efficacy, bias, or both.

Figure 1 A Normal Distributiona (black line), and a Mixed Normal Distributionb (dashed line)

None
Source: Adapted from Wilcox (1998).
Notes: a. μ = 0; s = 1. b. Distribution 1: μ = 0; s = 1, weight = .9; Distribution 2: 1: μ = 0; s = 10, weight = .1.

For an example of how relatively minor nonnormality can greatly decrease efficiency when one relies on conventional, nonrobust statistics, consider sampling data from the gray distribution shown in Figure 1. This heavy-tailed distribution—designed to simulate a realistic sampling scenario in which one normally distributed population contaminates another (μ = 0, σ = 1) with outliers—differs only slightly from the normal distribution, in black. Such nonnormality would almost surely go undetected by a researcher, even with large n and when tested explicitly. And yet such nonnormality substantially reduces the efficiency of classical statistics (e.g., Student's t, F ratio) because of their reliance on a nonrobust estimate of population dispersion: sample variance. For instance, when sampling 25 subjects from the normal distribution shown in Figure 1 and an identical distribution one unit apart, a researcher has a 96% chance of correctly rejecting the null hypothesis via an independent-samples t test. If the same researcher using the sample sizes and statistics were to sample subjects from two of the heavy-tailed distributions shown in Figure 1—also spaced one unit apart—however, that researcher would have only a 28% chance of correctly rejecting the null. Modern robust statistics, on the other hand, possess high power with both normal and heavy-tailed distributions.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading