Skip to main content icon/video/no-internet

The coefficient of determination, R2, is a useful measure of the overall value of the predictor variable(s) in predicting the outcome variable in the linear regression setting. R2 indicates the proportion of the overall sample variance of the outcome that is predicted or explained by the variation of the predictor variable, or in the case of multiple linear regression, by the set of predictors.

For example, an R2 of 0.35 indicates that 35% of the variation in the outcome has been explained just by predicting the outcome using the covariates included in the model. Thirty-five percent might be a very high portion of variation to predict in a field such as the social sciences; in other fields such as rocket science, one would expect the R2 to be much closer to 100%. The theoretical minimum R2 is 0; however, since linear regression is based on the best possible fit, R2 will always be greater than zero, even when the predictor and outcome variables bear no relationship to one another.

Just as R2 will virtually always be greater than zero, R2 will also always increase when a new predictor variable is added to the model, even if the new predictor is not associated with the outcome. To combat this effect, the adjusted R2 incorporates the same information as the usual R2 but then also penalizes for the number of predictor variables included in the model. As a result, as new predictors are added to a multiple linear regression model, R2 will always increase, but the adjusted R2 will increase only if the increase in R2 is greater than one would expect from chance alone. In such a model, the adjusted R2 is the most realistic estimate of the proportion of the variation in Y that is predicted by the covariates included in the model.

When only one predictor is included in the model, the coefficient of determination is mathematically related to the Pearson's correlation coefficient, r. Just as one would expect, squaring the correlation coefficient results in the value of the coefficient of determination. The coefficient of determination can also be found with the following formula: R2 = MSS/TSS = (TSSRSS)/TSS, where MSS is the model sum of squares, TSS is the total sum of squares associated with the outcome variable, and RSS is the residual sum of squares. Note that R2 is actually the fraction of the proportion of variability explained by the model out of the total variability in Y.

The coefficient of determination shows only association. As with linear regression, it is impossible to use R2 to determine whether one variable causes the other. In addition, the coefficient of determination shows only the magnitude of the association, not whether that association is statistically significant.

In summary, the coefficient of determination provides an excellent one-number summary of how clinically relevant the predictor variable is in a given linear regression model; the adjusted R2 provides the same one-number summary when more than one predictor is included in the model.

Felicity BoydEnders
  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading