Significance, Statistical

Neil J.Salkind

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Significance, Statistical

Edited by:
Neil J. Salkind
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n417
Subject:Research Design
Keywords:measurement

Request Permissions

Show page numbers Hide page numbers

Statistical significance refers to the difference between two measurements that results from more than randomness. Every research project, experiment, or study that counts, measures, quantifies, or otherwise collects or handles data will ultimately have the need to make comparisons between their data and some other standard. If a difference between two measurements or a measurement and some standard is detected, then that difference is a statistically significant difference if it is caused by an actual difference between the two, rather than simply a result of random variation.

For example, a psychologist might be interested in determining which of two treatments is more effective in treating depression. Having determined some means of determining each treatments’ effectiveness, the researcher might, then, administer the different treatments to selected individuals as a part of a designed experiment. Of course, any measure of the treatments’ effectiveness, no matter how objective, will only be an estimate of the actual effect. This estimate will vary from the true amount because of many factors, including idiosyncrasies in the testing process, errors in subjective judgment, flaws in measurement, or any number of other sources. Because of the inherent variability in the estimation of any unknown population parameter (in this case, the true effectiveness of each test), in comparing these two measures, the researcher must determine whether the difference between the two results is only caused by variability in the estimation process or if it is caused by an actual difference between the measurements. If the latter is true, then the difference in the measurements is said to be statistically significant. Although there are other methods, this determination is made most frequently by means of hypothesis testing. After briefly discussing the history, this entry discusses hypothesis testing and multiple comparisons and then the objections to statistical significance testing.

History

Although the formal development of hypothesis testing would not begin until 1925, less formal, ad hoc testing for statistical significance was being done around the turn of the 20th century. In 1908, William Gosset, who is commonly known as “Student,” developed his t test for the mean of a normally distributed population with unknown population standard deviation, and before that, in 1892, Karl Pearson published work on chi-square tests for significance with frequency distributions. Perhaps the earliest example of testing for statistical significance is a paper entitled “An Argument for Divine Providence Taken From the Constant Regularity of the Births of Both Sexes” by John Arbuthnot written in 1710, in which he examines birth records in London and concludes that there is good reason to think that the birth rate of males was higher than that of females (i.e., significantly higher). It was not until 1925, however, that R.A. Fisher began the formal development of testing for statistical significance. His work, along with that of Jerzy Neyman and Egon Pearson a few years later, is the foundation for what is known today as hypothesis testing.

Hypothesis Testing

Fisher, and others writing on this topic at that time, were influenced by the view, which was largely advanced by Karl Popper, that scientific theories must be falsifiable. To that end, the chief purpose of hypothesis testing is not to determine the actual size of the difference between two measurements, but rather to demonstrate that the [Page 1362]difference exists (i.e., is not zero) given some observed data. Specifically, hypothesis testing requires two hypotheses: the null hypothesis (often written H0) and the alternative hypothesis (often written Ha or H1). The null hypothesis is a straw man. It is the theory that the researcher is attempting to falsify by experimentation. The alternative hypothesis is a statement of what the researcher believes to be the true state of affairs. For instance, if a sociologist performs research to determine whether after-school programs reduce the likelihood that participants will be involved in violent crime, the appropriate null hypothesis is that such programs do not reduce the likelihood that participants will be involved in violent crime, whereas one alternative hypothesis might be that these programs do, in fact, reduce such crimes. An educational researcher might want to determine whether preschool attendance increases test scores in at-risk children. That researcher's null hypothesis would be that preschool does not increase test scores, whereas the alternative hypothesis might suggest that it does. In practice, the null hypothesis generally involves the “equals” sign, whereas the alternative hypothesis employs some sort of inequality. Typically, two types of alternative hypotheses are used: one-sided and two-sided. Although the null hypothesis states the simple and specific equality that the researcher seeks to disprove, the one-sided alternative hypothesis gives the direction in which the true value differs from the hypothesized value. A one-sided alternative hypothesis can be right-tailed, indicating that the true value of the population parameter under consideration is greater than the value hypothesized in H0, or left-tailed, indicating that the true value is less than the hypothesized value. For example, if a particular null hypothesis states that the true mean of a given population is, say, 5, then the corresponding right-tailed alternative hypothesis would be that the true mean is greater than 5, whereas the corresponding left-tailed hypothesis is that the true mean is less than 5. A two-tailed alternative hypothesis is different only in that it does not indicate direction (e.g., the true mean is not equal to 5). These hypotheses must be chosen before the data are collected. If the researcher allows the data to influence the choice of hypotheses, then the test for statistical significance will lose accuracy.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Significance, Statistical

History

Hypothesis Testing

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends