Skip to main content icon/video/no-internet

Outliers refers to atypical and infrequent observations that differ markedly from the bulk of observations (in location, scale, or distributional pattern). An observed outlier may be caused by the error in measurement or processing, influenced by an interruptive event (such as strike, natural disaster, political or economic crises), or generated by a different mechanism. Although an outlier may not necessarily be “wrong,” the effect of outliers on inference procedures can be substantial: A small number of outliers may have a disproportionate influence on the estimated value of the correlation coefficients or the slope of the regression line (see Figure 1); the real efficiency of optimal statistical methods could be reduced; and the resultant inference from the statistical data analysis could be unreliable or even invalid.

Outlier detection is important for effective data analysis and modeling. Various methods can be used to detect outliers in data analysis (such as histogram, boxplot, and scatterplot). If outliers are detected, they should not be simply excluded from the data set. It is important to find out whether they represent a purely random phenomenon or whether they indicate some misspecification in the systematic part of the model. In some cases, an outlier may be corrected by error control in measurement or recording. In the case of a highly asymmetric data distribution, an outlier may become a normal observation after a data transformation.

Figure 1 The Effects of Outlier on the Slope Coefficients of Linear

In most cases, the outliers are the most interesting observations in the data set, since they may reveal some unusual and interesting phenomenon. A thorough investigation of outliers will help achieve a better understanding of the data structure and more confidence in data modeling. To control the excessive influence of outliers, resistant methods (such as weighted-median polish) may be used in exploratory data analysis to help identify data structure, and robust methods (such as robust regression) may be used in confirmatory data analysis to produce efficient parameter estimates. Some methods available through geovisualization, such as brushing and linking, are useful means for exploring outliers.

ShumingBao

Further Readings

Haining, R. (1990). Spatial data analysis in the social and environmental sciences. Cambridge, UK: Cambridge University Press.
Kitanidis, P. K. (1997). Introduction to geostatistics. Cambridge, UK: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511626166
  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading