Null Hypothesis Significance Testing

Neil J.Salkind

doi:10.4135/9781412952644

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Null Hypothesis Significance Testing

Edited by:
Neil J. Salkind
In:Encyclopedia of Measurement and Statistics
Chapter DOI:https://doi.org/10.4135/9781412952644.n316
Subject:Quantitative/Statistical Research, Test & Measurement
Keywords:hypothesis testing; inferences; null hypothesis; significance testing

Request Permissions

Show page numbers Hide page numbers

Null hypothesis significance testing (NHST) dominates experimental and correlational methods in psychological research. Investigators are typically concerned with demonstrating the existence of an effect, that is, systematic variation in the data that can be distinguished from random noise, sampling error, or variation due to uncontrolled or nuisance variables. The null hypothesis is often, but does not have to be, identified with chance, and a p value is computed to express how improbable observed empirical data are under the assumption that the null hypothesis is true. When this probability falls below the conventional value of .05, it is concluded that the null hypothesis is false and that it is safe to presume the presence of a systematic source of variation. This inference is not strictly logical because modus tollens is not valid when stated probabilistically: From the statement “If the null hypothesis is true, then extreme data are improbable,” it does not follow that “If the data are probable, the null hypothesis is false.” Because NHST is a method of inductive, not logical, inference, researchers nevertheless believe that the rejection of the null hypothesis indicates the presence of an effect. In the long run, the argument goes, decisions reached [Page 696]by NHST will generate knowledge faster than would guessing or doing nothing.

Variants of NHST have been developed by various, and sometimes warring, schools of statistical thought. These schools differ in the assumptions they make about the nature of the data and the hypotheses and about how to make inferences. The following illustrations of possible inference strategies begin with informationand assumption-rich scenarios and proceed to the more degraded scenarios typical of most psychological research.

Full-Suite Analysis

Suppose extensive testing has revealed that average self-esteem scores are μ= 68 and 72 for women and men, respectively, and that the standard deviation within each gender is σ= 20. A sample of 200 scores with a mean of 71 is drawn from one of the two populations. The null hypothesis H0 is that women were sampled, and the alternative hypothesis H1 is that men were sampled. Analysis begins with the calculation of the probability of obtaining a mean of 71 or higher if H0 is true. The z score for the sample mean is

and the probability of a score at least this extreme is .017.

Evaluation of the data under the alternative hypothesis H1 yields z = .71, p = .24. That is, the data are not improbable under the assumption that men were sampled. The likelihood ratio (LR) of the two p values, p(D|H1)/p(D|H0), is 14.12, meaning that it is more than 14 times more likely that a sample of men rather than women would yield data of the kind found in the empirical sample. But how likely is it that the sample consisted of men? It is necessary to be explicit about the prior probability of sampling men. A simple intuition is that women and men were equally likely to be sampled, that is, p(H0) = p(H1) = .5. The summed products of these prior probabilities and their respective p values is the overall probability of the observed data. Here, p(D) = p(H0)p(D|H0)+p(H1)p(D|H1) = .13. This probability is critical for the calculation of the probability of the null hypothesis given the observed data. Bayes' theorem gives p(H0|D) as p(H0)p(D|H0)/p(D) = .07. Because the prior probabilities of the two hypotheses are the same, the ratio of the two posterior probabilities is the same as the LR. It can now be said that the sample is more than 14 times more likely to comprise men than women. The assumption of equal priors was just that, an assumption. Suppose the researcher knew that self-esteem scores were collected at four different sites, only one of which comprised men. Now p(H0|D) = .18, meaning that it is only 4.7 times more likely for the sample data to come from men than from women. Although the prior probability that men were sampled was low, the evidence is still strong enough to reject the null hypothesis that women were sampled and to accept the alternative.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Null Hypothesis Significance Testing

Full-Suite Analysis

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends