Skip to main content icon/video/no-internet

Residuals play an important role in statistical modeling. The most common definition of residual is the difference between the observed and expected response based on the estimated model so that each observation has a corresponding residual. This definition of residual is frequently encountered in most regression models and analysis of variance models. Loosely speaking, residual is interpreted as the portion of the observed response not explained by the model with the set of predictor variables.

Use of Residuals

A key step in modeling is the examination of the residuals for the purpose of checking if the model is a good fit in light of the data. In particular, residuals are used to check the validity of the assumptions of the model and to identify observations that are considered outliers (unusually large or small observations) and influential in estimating the model. These may be achieved through graphical techniques or performing formal statistical tests applied to residuals. One may work with raw residuals or standardized residuals. Standardized residuals are obtained by dividing the raw residuals by their estimated standard error.

An Illustration

In ordinary regression modeling, a response variable (denoted by Y) is modeled using a linear relationship with a single variable, known as the predictor variable (denoted by X). Mathematically, this model is represented by the following equation:

None

where β0 and β1 are the intercept and slope of the line, respectively, and ∊ is the random error term. As an example, suppose it is of interest to model job satisfaction (Y) based on the number of years on the job (X). The data are given in Table 1. A scatterplot of the data is displayed in Figure 1, represented by dots. The estimated model is given by

None

which is represented by the solid line in Figure 1. Residual is graphically represented as the difference between a given observation and the fitted line, as illustrated by the vertical line in Figure 1. For this particular observation, Y = 5.2 and X = 7, so that the estimated response is given by

None
Table 2 Data on Job Satisfaction and Number of Years on the Job
Number of YearsJob Satisfaction
75.2
46
137
105.9
157
127.3
96.5
148.1
208

Figure 1 Scatterplot of Job Satisfaction Data With Best Fit Line

None

Consequently, the residual for this observation is

None

A plot of the residuals versus the predictor variable checks the assumption that the relationship between X and Y is linear. This residual plot applied to the current example is displayed in Figure 2. If the assumption of linearity is appropriate, there should not be any obvious systematic pattern in this residual plot, which is true in this case. Hence, there is no evidence to reject the linearity assumption of the model.

Other Types of Residuals

In more complex statistical models, residuals are defined differently, and, in some cases, there may be more than one type of residual. For instance, in logistic models, there are deviance residuals in addition to the commonly defined residuals. In survival models, where the response variable of interest is time until an event occurs, there are also deviance, Cox-Snell, martingale, and score residuals, to name a few. The definitions of these residuals are no longer as simple as observed minus expected. However, no matter how these residuals are defined, their main purpose is still the same, that is, checking the appropriateness of the model.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading