Skip to main content icon/video/no-internet

Statistical Testing: Overview

Statistical testing is common in clinical settings as a method for drawing inferences for an unknown population value based on a sample of subjects. A clinician may wish, for instance, to test the hypothesis that a new surgical technique reduces the probability of adverse postoperative outcome or that, on average, a new drug reduces blood glucose in diabetics. Many types of statistical tests exist, and selection of an appropriate test is guided by the type of data collected, the statistics for which inferences are desired, the number of groups under study, and the sample size.

General Testing Procedure

Implementation of a statistical test begins with the specification of distinct hypotheses; a null hypothesis (denoted by H0) is assumed to be true, and the test is performed to evaluate the evidence against H0 in favor of an alternative hypothesis (denoted by HA). Typically, the hypothesis that the researcher may wish to show is specified as HA. For example, H0 for the study on the new surgical technique could be stated as “The probabilities of adverse outcome for patients assigned to the new treatment and patients assigned to the standard of care are equal.” HA would correspondingly be stated as “Adverse outcome probabilities are unequal for the two treatment techniques.”

The investigator collects a sample of data from the population of interest, and a test statistic is derived using the sample responses. The test statistic is defined as a quantity that summarizes the sample in such a way that a decision to accept H0 (in preference to HA) or to reject H0 (in favor of HA) can be made based on all possible values of the quantity. The set of values corresponding to the decision to accept H0 is called the acceptance region, while the set of values corresponding to the decision to reject H0 is called the rejection region.

The test statistic has a certain probability distribution—the sampling distribution—under the assumption that H0 is true (i.e., the probability distribution of test statistics arising from many repeated samples). The shape of the sampling distribution depends on the type of test being implemented.

As the relevant population quantity is unknown (thereby necessitating testing), the potential exists for the test to produce an incorrect conclusion. The conclusion to reject H0 when indeed H0 is true is known as a Type I error or false positive. On the other hand, accepting H0 when HA is true is called a Type II error or false negative. The probability of committing a Type I error (often referred to as the significance level) is commonly denoted by a, while the probability of committing a Type II error is commonly denoted by β. The significance level corresponds to the range of the rejection region and is specified by the experimenter prior to testing. Many journals require α = .05 for testing.

Statistical tests can also be implemented in terms of p values, defined as the probability of observing a test statistic as extreme or more extreme than that which would be observed if the experiment were to be repeated many times—or, in other words, the observed significance level of the test. The decision to reject H0 is made if the p value is less than a, and the proximity of the p value to 0 is a measure of the strength of the evidence against H0.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading