Skip to main content icon/video/no-internet

Regression to the mean (RTM) is a widespread statistical phenomenon that occurs when a nonrandom sample is selected from a population and the two variables of interest measured are imperfectly correlated. The smaller the correlation between these two variables, the more extreme the obtained value is from the population mean, and the larger the effect of RTM (that is, there is more opportunity or room for RTM).

If variables X and Y have standard deviations SDx and SDy, and correlation = r, the slope of the familiar least squares regression line can be written rSDy/SDx. Thus a change of one standard deviation in X is associated with a change of r standard deviations in Y. Unless X and Y are perfectly linearly related, so that all the points lie along a straight line, r is less than 1. For a given value of X, the predicted value of Y is always fewer standard deviations from its mean than is X from its mean. Because RTM will be in effect to some extent unless r = 1, it almost always occurs in practice.

As discussed by Donald Campbell and David Kenny, RTM does not depend on the assumption of linearity, the level of measurement of the variable (for example, the variable can be dichotomous), or measurement error. Given a less than perfect correlation between X and Y, RTM is a mathematical necessity. Although it is not inherent in either biological data or psychological data, RTM has important predictive implications for both. In situations in which one has little information to make a judgment, often the best advice is to use the mean value as the prediction.

History

An early example of RTM may be found in the work of Sir Francis Galton on heritability of height. He observed that tall parents tended to have somewhat shorter children than would be expected given their parents’ extreme height. Seeking an empirical answer, Galton measured the height of 930 adult children and their parents and calculated the average height of the parents. He noted that when the average height of the parents was greater than the mean of the population, the children were shorter than their parents. Likewise, when the average height of the parents was shorter than the population mean, the children were taller than their parents. Galton called this phenomenon regression toward mediocrity; we now call it RTM. This is a statistical, not a genetic, phenomenon.

Examples

Treatment versus Nontreatment

In general, among ill individuals, certain characteristics, whether physical or mental, such as high blood pressure or depressed mood, have been observed to deviate from the population mean. Thus, a treatment would be deemed effective when those treated show improvement on such measured indicators of illness at posttreatment (e.g., a lowering of high blood pressure or remission of or reduced severity of depressed mood). However, given that such characteristics deviate more from the population mean in ill individuals than in well individuals, this could be attributable in part to RTM. Moreover, it is likely that on a second observation, untreated individuals with high blood pressure or depressed mood also will show some improvement owing to RTM. It also is probable that individuals designated as within the normal range of blood pressure or mood at first observation will be somewhat less normal at a second observation, also due in part to RTM. In order to identify true treatment effects, it is important to assess an untreated group of similar individuals or a group of similar individuals in an alternative treatment in order to adjust for the effect of RTM.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading