Data Snooping

Neil J.Salkind

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Data Snooping

Edited by:
Neil J. Salkind
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n102
Subject:Research Design
Keywords:hypothesis testing; null hypothesis

Request Permissions

Show page numbers Hide page numbers

The term data snooping, sometimes also referred to as data dredging or data fishing, is used to describe the situation in which a particular data set is analyzed repeatedly without an a priori hypothesis of interest. The practice of data snooping, although common, is problematic because it can result in a significant finding (e.g., rejection of a null hypothesis) that is nothing more than a chance artifact of the repeated analyses of the data. The biases introduced by data snooping increase the more a data set is analyzed in the hope of a significant finding. Empirical research that is based on experimentation and observation has the potential to be impacted by data snooping.

Data Snooping and Multiple Hypothesis Testing

A hypothesis test is conducted at a significance level, denoted α, corresponding to the probability of incorrectly rejecting a true null hypothesis (the so-called Type 1 error). Data snooping essentially involves performing a large number of hypothesis tests on a particular data set with the hope that one of the tests will be significant. This data-snooping process of performing a large number of hypothesis tests results in the actual significance level being increased, or the burden of proof for finding a significant result being substantially reduced, resulting in potentially misleading results. For example, if 100 independent hypothesis tests are conducted on a data set at a significance level of 5%, it would be expected that about 5 out of the 100 tests would yield significant results simply by chance alone, even if the null hypothesis were, in fact, true. Any conclusions of statistical significance at the 5% level based on an analysis such as this are misleading because the data-snooping process has essentially ensured that something significant will be found. This means that if new data are obtained, it is unlikely that the “significant” results found via the data-snooping process would be replicated.

Data-Snooping Examples

Example 1

An investigator obtains data to investigate the impact of a treatment on the mean of a response variable of interest without a predefined view (alternative hypothesis) of the direction (positive or negative) of the possible effect of the treatment. Data snooping would occur in this situation if after analyzing the data, the investigator observes that the treatment appears to have a negative effect on the response variable and then uses a one-sided alternative hypothesis corresponding to the treatment having a negative effect. In this situation, a two-sided alternative hypothesis, corresponding to the investigator's a priori ignorance on the effect of the treatment, would be appropriate. Data snooping in this example results in the p value for the hypothesis test being halved, resulting in a greater chance of assessing a significant effect of the treatment. To avoid problems of this nature, many journals require that two-sided alternatives be used for hypothesis tests.

Example 2

A data set containing information on a response variable and six explanatory variables is analyzed, without any a priori hypotheses of interest, by fitting each of the 64 multiple linear regression models obtained by means of different combinations of the six explanatory variables, and then only statistically significant associations are reported. The effect of data snooping in this example would be more severe than in Example 1 because the data are being analyzed many more times (more hypothesis tests are performed), meaning that one would expect to see a number of significant associations simply due to chance.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Data Snooping

Data Snooping and Multiple Hypothesis Testing

Data-Snooping Examples

Example 1

Example 2

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends