Skip to main content icon/video/no-internet

Traditional methods for estimating regression models can be unduly influenced by a small subset of the data. For example, using ordinary least squares (OLS) regression to model economic growth in 15 industrialized democracies, Peter Lange and Geoffrey Garrett find that the interaction between Left governments and organized labor is positively and significantly correlated with economic performance. In his analysis of the data, however, Bruce Western illustrates that these findings are largely determined by the Norwegian case. Using robust estimation techniques to account for this observation reveals much greater uncertainty about the influence of the interaction effect on economic performance. OLS estimation minimizes large residuals at the expense of degrading the fit of the remaining observations. The coefficient estimates it generates can thus be strongly influenced by even a single large residual, as was the case in the model estimated by Lange and Garret. Like OLS estimation, unusual points may also influence traditional maximum likelihood methods. The likelihood often depends on means and variances, whose estimated values can be largely determined by points lying outside the majority of the data. Thus, when implementing any regression method based on the mean and variance, the analyst must be wary of unusual observations. In this entry, various ways of dealing with this problem are discussed.

Observations can be unusual in two important ways: First, an observation may be an outlier if the value of the explanatory variable is typical of the sample but the value of the outcome variable is atypically large or small; alternatively, an observation may be a leverage point if the value of the explanatory variable(s) is demonstrably different from the rest of the data. An influential observation refers to either an outlier or leverage point whose inclusion in the analysis substantially alters the estimates of the statistics of interest, including parameter estimates and predictions, the estimated variance of these values, and the goodness-of-fit statistics. These influential observations can result from “bad” data (such as data that have been recorded incorrectly), improper modeling of an outcome variable with a heavy-tailed distribution, or models that fail to describe the data well for certain values of the predictor.

Even if only a small fraction of the data—or even a single observation—is influential, estimation strategies that assume all data are modeled correctly may produce erroneous results. To avoid producing incorrect estimates, regression analyses must account for influential observations. A popular strategy for dealing with these observations is the diagnosis and removal of these points before the estimation of a regression model to the remaining “good” data. This technique is acceptable if the observation has been recorded incorrectly and the true value cannot be recovered, if the observation arises from a different population than the other data, or if there is a theoretical justification for excluding the observation from the analysis. It is problematic, in contrast, if the influential point emerges because of heavy-tailed distributions or inadequate models. In these cases, an observation cannot be discarded simply because it fails to fit the model, as removing this data point provides unwarranted support to the incorrect model. Instead, robust regression provides a compromise between deleting influential observations and allowing them to violate the assumptions of traditional regression estimators. When data conform to conventional assumptions, robust and nonrobust methods provide similar estimates of statistics of interest. Robust estimators, however, are resistant to the effects of influential observations and retain their efficiency when data are nonnormal.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading