Winsorize

Neil J.Salkind

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Winsorize

Edited by:
Neil J. Salkind
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n502
Subject:Research Design

Request Permissions

Show page numbers Hide page numbers

Winsorization is one method, among others, of handling the problem of outliers in a distribution of data. In addition, researchers sometimes Winsorize to give the distribution more desirable statistical properties. To Winsorize, one converts the value(s) of data points that are outlyingly high to the value of the highest data point not considered to be an outlier. For example, if a survey item on respondents’ numbers of sexual partners in the past year yielded answers of 0, 1, 2, 3, 5, 7, 10 (with multiple people giving each of the previous responses), and 100, then Winsorization would convert the 100 to 10. In this sense, the outlyingly high values are reduced in magnitude (or “reined in”) to a value that is still at the high end of the distribution but not as extreme. An analogous procedure can also be done with values that are considered to be outliers at the low end of the distribution; these values would be increased to the lowest value not [Page 1637]considered an outlier. A possible advantage of Winsorizing is that it preserves the information that a case had among the highest (or lowest) values in a distribution but protects against some of the harmful effects of outliers. An alternative to Winsorizing is trimming, in which outlier values are removed instead of converted to other values; recommendations of one technique, vis-à-vis the other, will be discussed. Winsorizing and trimming both fall within a field known as robust statistics.

There appear to be at least two reasons for Winsorizing (or trimming). One, as is generally known, is that the presence of an outlier can exert a disproportionate influence on other statistical analyses one conducts (e.g., a Pearson correlation). Another rationale is not so much concerned with extreme outliers but with trying to ensure that estimates of a population parameter (e.g., the mean) from repeated samplings tend to take on similar values (statistical efficiency). Before engaging in Winsorization or trimming, of course, one should conduct the usual diagnostics on apparent outliers to determine, for example, whether they are the result of faulty data recording or are legitimate values from a phenomenon that can generate extreme values.

An actual practical example of Winsorization from the literature is Craig A. Anderson and Kathryn B. Anderson's study of hot temperature's relation to violent crime indices, with U.S. cities as the units of analysis. These researchers used 10% Winsorizations (converting the top 10% of values to the 90th percentile and the bottom 10% to the 10th percentile) on their temperature, crime, and demographic control variables. They found that, with very limited exceptions, the Winsorized variables no longer had outliers. Winsorized results from their correlational analyses were found to be highly similar to their non-Winsorized results.

In Winsorizing, it is helpful to have the task built into the statistical package being used. There is a module for STATA (StataCorp, College Station, TX) called WINSOR that will Winsorize a variable in the data set. Whereas Winsorizing can be done symmetrically or asymmetrically in general practice (i.e., to one or both sides of a distribution), this software module requires it be done symmetrically, based on percentage. That is, the user identifies a particular percentage that he or she wants Winsorized and that many data points are altered from both extremes of the distribution on the specified variable. So although this module being built into the statistical package is convenient, it imposes restrictions on how the procedure is performed. Also, according to a search of PsycINFO articles, SPSS® (IBM® SPSS® Statistics was formerly called PASW® Statistics) is used over ten times as often as STATA in psychological research. In SPSS®, however, there is not a built-in module and the conversion needs to be done essentially manually.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Winsorize

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends