Skip to main content icon/video/no-internet

Multicollinearity is a phenomenon that may occur in multiple regression analysis when one or more of the independent variables are related to each other. The relationship between the independent variables could be expressed as near linear dependencies.

A simple form of multicollinearity could be due to high correlation between some pair(s) of independent variables. Assuming that there are five independent variables, X1, X2, X3, X4, and X5, that are being used to develop a model for the dependent variable, Y, suppose that the correlations between X2 and X3, and between X4 and X5, are high (≥ 0.90). In such a case, maybe one of the two variables X2 or X3, and one of the two variables X4 or X5, could be used in the model. The rationale is that there is very little independent contribution of the variable that is left out, in the presence of the one that is retained, in explaining the variation in the dependent variable.

More complex forms of multicollinearity may exist when three or more independent variables are nearly linearly related, even though there might not be a high pair wise correlation among them. This could be represented by the approximate relation X1 + 2X2X3 0.

Effects of Multicollinearity

The impact of multicollinearity on the development of a multiple regression model and drawing inferences from it is multifaceted. First, the variance of the estimated model coefficients is large, which leads to their instability. Small perturbations of the observations and/or omission of an independent variable from a large set of independent variables may cause large fluctuations in the estimated regression coefficients.

Second, the signs associated with the regression coefficients may be somewhat contrary to what is expected, given the setting of the problem. For example, in predicting resource requirements (Y), using production quantity (X1), direct labor (X2), and raw material (X3), it is normally expected that the coefficients associated with X1, X2, and X3 will be positive. However, given that X1 and X2 are highly correlated, and so are X1 and X3, it is possible for one of the estimated coefficients to turn out negative.

Third, it is possible for the full model to be statistically significant, even though none of the individual coefficients is significant. Thus, all of the independent variables, taken collectively, may provide a good fit to the response variable, leading to a small value of the residual sum of squares. On the contrary, individual coefficients are estimated poorly. In the presence of multicollinearity, a given regression coefficient may not reflect the inherent effect of the particular regressor. It is influenced by the variables that are in the model.

Detection of Multicollinearity

All of the three possible effects, previously discussed, may be examined closely for the possible presence of multicollinearity. These are as follows: high pairwise correlation among some of the independent variables; wide confidence intervals associated with the regression coefficients; opposite sign of estimated regression coefficients based on theoretical knowledge of the problem; nonsignificant tests on individual coefficients while the model as a whole is significant; and large changes in the estimated coefficients when an observation is slightly changed or deleted, or when an independent variable is added or deleted.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading