Skip to main content icon/video/no-internet

Residual Plot

Residual plots play an important role in regression analysis when the goal is to confirm or negate the individual regression assumptions, identify outliers, and/or assess the adequacy of the fitted model. Residual plots are graphical representations of the residuals, usually in the form of two-dimensional graphs. In other words, residual plots attempt to show relationships between the residuals and either the explanatory variables (X1, X2, …, Xp), the fitted values (ŷi), index numbers (1,2,…, n), or the normal scores (values from a random sample from the standard normal distribution), among others, often using scatterplots. Before this entry discusses the types of individual residual plots in greater detail, it reviews the concept of residuals in regression analysis, including different types of residuals, emphasizing the most important standard regression assumptions.

Concept

Defining a linear regression model as

None

the ith fitted value, the value lying on the hyper-plane, can be calculated as

None

where p refers to the number of explanatory variables and n the number of observations. For p = 1, the regression model represents a line; for p = 2, a plane; and for p = 3 or more, a hyper-plane. From here on, the term hyperplane is used, but the same concepts apply for a line, that is, for the simple regression model, or a plane.

The ith residual (ei), is then defined as

None

and it represents the vertical distance between the observed value (yi) and the corresponding fitted values (ŷi) for the ith observation. In this sense, the ith residual (ei) is an estimate for the unobservable error term (∊) and, thus, represents the part of the dependent variable (Y) that is not linearly related to the explanatory variables (X1, X2, …, Xp). One can also say that the ith residual (ei) is the part of Y that cannot be explained by the estimated regression model.

In regression analysis, the error terms incorporate a set of important and far-reaching assumptions. These are that the error terms are independently and identically distributed normal random variables, each with a mean of zero and a common variance. This assumption can be expressed as

None

The residuals, however, do not necessarily exhibit these error term assumptions. By definition, for instance, the sum of all residuals must equal to zero, or

None

The direct implication from this residual property is that although the error terms are independent, the residuals are not. In addition, residuals resemble only estimates of the true and unobservable error terms and as such do not have the same variance σ2, as implied in Equation 4. The variance of the ith residual is defined as

None

where pii is the ith leverage value, that is, the ith diagonal element in the projection matrix, also known as the hat matrix. This unequal variance property can be coped with by means of standardization of the residuals.

Standardization of the residuals is achieved by dividing the ith residual by its standard deviation,

None

where zi denotes the ith standardized residual, which now by definition has a mean of zero and a standard deviation of one. For practical purposes, however, σ is not known and must therefore be replaced by the unbiased standard error of the estimate (

None
), where
None
is defined

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading