Skip to main content icon/video/no-internet

Maximum likelihood (ML) is the most widely used approach for statistical inference. Although it has the advantage of employing straightforward calculations, the ML approach lacks robustness, giving rise to spurious results and misleading conclusions. Researchers in epidemiology and a variety of other experimental and health sciences are becoming increasingly aware of this issue and are informed about the available alternatives for more reliable inference.

Concept of Robustness

What is robustness? Although it is intuitively clear what robustness should be, there is no unique statistical definition, in part because of the diverse aspects of robustness. The generally accepted notion is that a robust statistical procedure should be insensitive to changes not involving the parameters, but sensitive to changes in model parameters. For example, the ML approach is the most powerful for detecting changes in the parameters under the model. However, it is generally sensitive to model assumptions, yielding biased estimates and incorrect inference when the study data depart from the model. A robust procedure aims to provide good power under the model, while still yielding reliable estimates when data drift away from the model.

To elucidate the basic idea, consider a relatively simple problem of comparing two independent groups. The most common procedure is the t test developed based on ML under normal distribution assumption. This procedure compares the two sample means for evidence of group differences. If the data are normally distributed for both groups, the difference statistic between the two group means has a t distribution, providing the basis for inference (i.e., p values and confidence intervals). In many applications, however, data often deviate from the normal model. Such departures from normality can affect both the estimate and inference. For example, the difference statistic may severely overor underestimate the true group difference in the presence of outliers, giving rise to biased estimates. In many applications, the difference statistic may be unbiased, but the skewness and sparseness in the data distribution may seriously affect the sampling distribution of the statistic, making inference based on the t distribution incorrect. Thus, a robust procedure must address either one or both issues.

Robustness Approaches

A common cause of bias in the estimate is outliers (observations that are exceptionally large or small). Although the sample mean is easy to interpret and work with, it is sensitive to such outlying observations. The common approach to address the effect of outlier is the use of order statistics. By ordering the observations from the smallest to the largest, we can define estimates that are not influenced, or are less influenced, by outliers. For example, the trimmed mean is the sample mean calculated based on the data after removing a certain percentage of observations in the smallest and largest range of the order statistic. Alternatively, one may downweight such outliers to lessen their effect. For example, the winsorized mean is the sample mean after replacing a fraction of the lowest and highest values by the next values counting inward from the extremes, respectively. The sample median is yet another common robust estimate based on the order statistic. Thus, for comparing two groups, we can also form a difference statistic by using any of these robust estimates.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading