Skip to main content icon/video/no-internet

Winsorize

Winsorization is one method, among others, of handling the problem of outliers in a distribution of data. In addition, researchers sometimes Winsorize to give the distribution more desirable statistical properties. To Winsorize, one converts the value(s) of data points that are outlyingly high to the value of the highest data point not considered to be an outlier. For example, if a survey item on respondents’ numbers of sexual partners in the past year yielded answers of 0, 1, 2, 3, 5, 7, 10 (with multiple people giving each of the previous responses), and 100, then Winsorization would convert the 100 to 10. In this sense, the outlyingly high values are reduced in magnitude (or “reined in”) to a value that is still at the high end of the distribution but not as extreme. An analogous procedure can also be done with values that are considered to be outliers at the low end of the distribution; these values would be increased to the lowest value not considered an outlier. A possible advantage of Winsorizing is that it preserves the information that a case had among the highest (or lowest) values in a distribution but protects against some of the harmful effects of outliers. An alternative to Winsorizing is trimming, in which outlier values are removed instead of converted to other values; recommendations of one technique, vis-à-vis the other, will be discussed. Winsorizing and trimming both fall within a field known as robust statistics.

There appear to be at least two reasons for Winsorizing (or trimming). One, as is generally known, is that the presence of an outlier can exert a disproportionate influence on other statistical analyses one conducts (e.g., a Pearson correlation). Another rationale is not so much concerned with extreme outliers but with trying to ensure that estimates of a population parameter (e.g., the mean) from repeated samplings tend to take on similar values (statistical efficiency). Before engaging in Winsorization or trimming, of course, one should conduct the usual diagnostics on apparent outliers to determine, for example, whether they are the result of faulty data recording or are legitimate values from a phenomenon that can generate extreme values.

An actual practical example of Winsorization from the literature is Craig A. Anderson and Kathryn B. Anderson's study of hot temperature's relation to violent crime indices, with U.S. cities as the units of analysis. These researchers used 10% Winsorizations (converting the top 10% of values to the 90th percentile and the bottom 10% to the 10th percentile) on their temperature, crime, and demographic control variables. They found that, with very limited exceptions, the Winsorized variables no longer had outliers. Winsorized results from their correlational analyses were found to be highly similar to their non-Winsorized results.

In Winsorizing, it is helpful to have the task built into the statistical package being used. There is a module for STATA (StataCorp, College Station, TX) called WINSOR that will Winsorize a variable in the data set. Whereas Winsorizing can be done symmetrically or asymmetrically in general practice (i.e., to one or both sides of a distribution), this software module requires it be done symmetrically, based on percentage. That is, the user identifies a particular percentage that he or she wants Winsorized and that many data points are altered from both extremes of the distribution on the specified variable. So although this module being built into the statistical package is convenient, it imposes restrictions on how the procedure is performed. Also, according to a search of PsycINFO articles, SPSS® (IBM® SPSS® Statistics was formerly called PASW® Statistics) is used over ten times as often as STATA in psychological research. In SPSS®, however, there is not a built-in module and the conversion needs to be done essentially manually.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading