Skip to main content icon/video/no-internet

In experiments, researchers must balance between two competing arguments with respect to the sample size. On the one hand, the sample size must be large enough to have sufficient statistical power for accurate statistical inference. On the other hand, each additional observation comes at a cost and, especially when performing medical experiments or working with test animals, the researcher has the ethical obligation to avoid unnecessary oversampling. The field of optimal stopping, or sequential sampling, studies ways in which to do this. Various techniques for sequential sampling are available. This entry, based on the 2019 work of Casper J. Albers, explains some of these techniques.

Sometimes, researchers simply collect and analyze a sample, look at the resulting p value, and collect more data until either significance is reached or resources are exhausted. This approach is flawed because it leads to multiple testing with an inflated false discovery rate (i.e., too many false positives). This leads to bias in both the p values and the estimated effect size. Several statistical corrections, such as the Bonferroni correction, exist to overcome this. In sequential designs, similar solutions exist. Sequential methods have mostly been designed for classical (frequentist) approaches. For Bayesian analysis, there still is debate among statisticians whether or not it is necessary to correct for multiple testing in a sequential design.

Roughly speaking, there are two classes of such sequential approaches: interim analyses (also known as group sequential analyses) and full sequential analyses. In interim analyses, one prespecifies moments when one wants to inspect the data, whereas in full sequential analyses, the data are analyzed after each observation.

As an example of interim analysis, one could decide to analyze the data both after n1 = 50 and, if necessary, after n2 = 100 measurements. The decision rule of testing with α = 0.029 after n1 measurements and stopping when the result is significant or continuing until n2 and testing again at this α level provides the usual overall false discovery rate of 0.05. The major advantage over nonsequential testing is that in case of sufficient evidence, one can stop data collection halfway through the process. One can specify a priori how many batches of measurements one maximally wants, and how large these should be, and then compute the α level on which to test the interim analyses. Such computations can be based on Pocock bounds, Haybittle–Peto bounds, or O’Brian and Fleming bounds.

In full sequential approaches, one does not check the data at a few prespecified points, but after every observation. Statistically, this is the optimal approach of deciding on the sample size, as here one could, for example, stop collection at n = 62 when the example of an interim approach would need n = 100.

Theories about sequential sampling by statisticians Abraham Wald and Alan Turing date back to the 1940s. These full sequential approaches are quite technical. Wald’s procedure, for instance, involves computing the log-likelihood ratio after each observation and stopping when this sum leaves a prespecified interval. The computation of this log-likelihood ratio is far from straightforward. Under mild statistical assumptions, it can be proven that Wald’s sequential probability ratio test is the optimal mathematical procedure for these types of problems. A similar approach is cumulative sum control charts (CUSUM). CUSUM and related methods are the gold standard in statistical quality control.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading