Skip to main content icon/video/no-internet

Collinearity is a concern when two regression covariates are associated with one another. In essence, because the covariates are correlated, their prediction of the outcome is no longer independent. As a result, when both covariates are included in the same regression model, each one becomes less statistically significant because they are explaining some of the same variance in the dependent variable. Cues that collinearity may be a concern are (1) high correlation or association between two potential covariates, (2) a dramatic increase in the p value (i.e., reduction in the significance level) of one covariate when another covariate is included in the regression model, or (3) high variance inflation factors. The variance inflation factor for each covariate is 1/(1 − R2 ∗), where R2 ∗ is the coefficient of determination for the model excluding only that covariate. Variance inflation factors of 1 or 2 show essentially no collinearity; 5 represents moderate collinearity. Variance inflation factors greater than 10 suggest that collinearity is such a concern that one might consider removing one of the collinear covariates. Variance inflation factors of 20 and higher show extreme collinearity.

In the schematic on the left in Figure 1, X1 and X2 are nearly unassociated with one another, so the overlapping gray area is quite small. In the schematic on the right, the overlapping area is larger; in this gray area, the two covariates are battling to predict the same variance in the outcome, so that each prediction has lower statistical significance. If the outcome were income level, one might expect covariates such as education and previously winning the lottery to be similar to the schematic on the left. Alternatively, covariates such as education and parental income level would likely be more similar to the schematic on the right.

Multicollinearity describes a situation in which more than two covariates are associated, so that when all are included in the model, one observes a decrease in statistical significance (increased p values). Like the diagnosis for collinearity, one can assess multicollinearity using variance inflation factors with the same guide that values greater than 10 suggest a high degree of multicollinearity. Unlike the diagnosis for collinearity, however, one may not be able to predict multicollinearity before observing its effects on the multiple regression model, because Community-Based Participatory Research 209 any two of the covariates may have only a low degree of correlation or association.

Figure 1 Collinearity Schematic

None
Note: Low collinearity is shown on the left, while moderate to high collinearity is represented on the right.

Sometimes the goal of a multiple regression model is to provide the best possible prediction of the outcome. In this situation, even with high variance inflation factors, one might choose to include several somewhat collinear covariates; the result will likely be that one or more fails to achieve statistical significance. However, care should be taken in this situation. A high degree of multicollinearity goes hand in hand with instability in the regression coefficients themselves. If the goal is simply to predict the outcome, then this instability need not be a major concern since the predicted value of Y would be unlikely to change much with a slightly different model. However, if one also cares about the regression equation used to make the prediction, multicollinearity can become a grave concern.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading