Skip to main content icon/video/no-internet

“Probable Error of a Mean, The”

Initially appreciated by only a handful of brewers and statisticians, “The Probable Error of a Mean” is now, 100 years later, universally acclaimed as a classic by statisticians and behavioral scientists alike. Written by William Sealy Gosset under the pseudonym “Student,” its publication paved the way for the statistical era that continues today, one focused on how best to draw inferences about large populations from small samples of data.

Gosset and “Student”

Schooled in mathematics and chemistry, Gosset was hired by Arthur Guinness, Son, & Co., Ltd. to apply recent innovations in the field of statistics to the business of brewing beer. As a brewer, Gosset analyzed how agricultural and brewing parameters (e.g., the type of overlineley used) affected crop yields and, in his words, the “behavior of beer.” Because of the cost and time associated with growing crops and brewing beer, Gosset and his fellow “experimental” brewers could not afford to gather the large amounts of data typically gathered by statisticians of their era. Statisticians, however, had not yet developed accurate inferential methods for working with small samples of data, requiring Gosset to develop methods of his own. With the approval of his employer, Gosset spent a year (19061907) in Karl Pearson's biometric laboratory, developing “The Probable Error of a Mean” as well as “Probable Error of a Correlation Coefficient.”

The most immediately striking aspect of “The Probable Error of a Mean” is its pseudonymous author: “Student.” Why would a statistician require anonymity? The answer to this question came publicly in 1930, when fellow statistician Harold Hotelling revealed that “Student” was Gosset, and that his anonymity came at the request of his employer, a “large Dublin Brewery.” At the time, Guinness considered its use of statistics a trade secret and forbade its employees from publishing their work. Only after negotiations with his supervisors was Gosset able to publish his work, agreeing to neither use his real name nor publish proprietary data.

The Problem: Estimating Sampling Error

As its title implies, “The Probable Error of a Mean” focuses primarily on determining the likelihood that a sample mean approximates the mean of the population from which it was drawn. The “probable error” of a mean, like its standard error, is a specific estimate of the dispersion of its sampling distribution and was used commonly at the start of the 20th century. Estimating this dispersion was then, and remains today, a foundational step of statistical inference: To draw an inference about a population parameter from a sampled mean (or, in the case of null hypothesis significance testing, infer the probability that a certain population would yield a sampled mean as extreme as the obtained value), one must first specify the sampling distribution of the mean. The Central Limit Theorem provides the basis for parametrically specifying this sampling distribution, but does so in terms of population variance. In nearly all research, however, both population mean and variance are unknown. To specify the sampling distribution of the mean, therefore, researchers must use the sample variance.

Gosset confronted this problem with using sample variance to estimate the sampling distribution of the mean, namely, that there is error associated with sample variance. And because the sampling distribution of the variance is positively skewed, this error is more likely to result in the underestimation than the overestimation of population variance (even when using an unbiased estimator of population variance). Furthermore, this error, like the error associated with sampled means, increases as sample size decreases, presenting a particular (and arguably exclusive) problem for small sample researchers such as Gosset. To draw inferences about population means from sampled data, Gosset could notas large-sample researchers didsimply calculate a standard z statistic and rely on a unit normal table to find the corresponding p values. The unit normal table does not account for either the estimation of population variance or the fact that the error in this estimate depends on sample size. This limitation inspired Gosset to write “The Probable Error of a Mean” in a self-described effort to (a) determine at what point sample sizes become so small that the above method of normal approximation becomes invalid and (b) develop a set of valid probability tables for small sample sizes.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading