Skip to main content icon/video/no-internet

Beta (β) refers to the probability of Type II error in a statistical hypothesis test. Frequently, the power of a test, equal to 1 – β rather than β itself, is referred to as a measure of quality for a hypothesis test. This entry discusses the role of β in hypothesis testing and its relationship with significance (α).

Hypothesis Testing and Beta

Hypothesis testing is a very important part of statistical inference: the formal process of deciding whether a particular contention (called the null hypothesis) is supported by the data, or whether a second contention (called the alternative hypothesis) is preferred. In this context, one can represent the situation in a simple 2 × 2 decision table in which the columns reflect the true (unobservable) situation and the rows reflect the inference made based on a set of data:

None

The language used in the decision table is subtle but deliberate. Although people commonly speak of accepting hypotheses, under the maxim that scientific theories are not so much proven as supported by evidence, we might more properly speak of failing to reject a hypothesis rather than of accepting it. Note also that it may be the case that neither the null nor the alternative hypothesis is, in fact, true, but generally we might think of one as preferable over the other on the basis of evidence. Semantics notwithstanding, the decision table makes clear that there exist two distinct possible types of error: that in which the null hypothesis is rejected when it is, in fact, true; and that in which the null hypothesis is not rejected when it is, in fact, false. A simple example that helps one in thinking about the difference between these two types of error is a criminal trial in the U.S. judicial system. In that system, there is an initial presumption of innocence (null hypothesis), and evidence is presented in order to reach a decision to convict (reject the null hypothesis) or acquit (fail to reject the null). In this context, a Type I error is committed if an innocent person is convicted, while a Type II error is committed if a guilty person is acquitted. Clearly, both types of error cannot occur in a single trial; after all, a person cannot be both innocent and guilty of a particular crime. However, a priori we can conceive of the probability of each type of error, with the probability of a Type I error called the significance level of a test and denoted by α, and the probability of a Type II error denoted by β, with 1 – β, the probability of not committing a Type II error, called the power of the test.

Relationship with Significance

Just as it is impossible to realize both types of error in a single test, it is also not possible to minimize both α and β in a particular experiment with fixed sample size. In this sense, in a given experiment, there is a trade-off between α and β, meaning that both cannot be specified or guaranteed to be low. For example, a simple way to guarantee no chance of a Type I error would be to never reject the null hypothesis regardless of the data, but such a strategy would typically result in a very large β. Hence, it is common practice in statistical inference to fix the significance level at some nominal, low value (usually .05) and to compute and report β in communicating the result of the test. Note the implied asymmetry between the two types of error possible from a hypothesis test: α is held at some prespecified value, while β is not constrained. The preference for controlling α rather than β also has an analogue in the judicial example above, in which the concept of “beyond reasonable doubt” captures the idea of setting α at some low level, and where there is an oft-stated preference for setting a guilty person free over convicting an innocent person, thereby preferring to commit a Type II error over a Type I error. The common choice of .05 for α most likely stems from Sir Ronald Fisher's 1926 statement that he “prefers to set a low standard of significance at the 5% point, and ignore entirely all results that fail to reach that level.” He went on to say that “a scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance” (Fisher, 1926, p. 504).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading