Skip to main content icon/video/no-internet

Collinearity is a situation in which the predictor, or exogenous, variables in a linear regression model are linearly related among themselves or with the intercept term, and this relation may lead to adverse effects on the estimated model parameters, particularly the regression coefficients and their associated standard errors. In practice, researchers often treat correlation between predictor variables as collinearity, but strictly speaking they are not the same; strong correlation implies collinearity, but the opposite is not necessarily true. When there is strong collinearity in a linear regression model, the model estimation procedure is not able to uniquely identify the regression coefficients for highly correlated variables or terms and therefore cannot separate the covariate effects. This lack of identifiability affects the interpretability of the regression model coefficients, which can cause misleading conclusions about the relationships between variables under study in the model.

Consequences of Collinearity

There are several negative effects of strong collinearity on estimated regression model parameters that can interfere with inference on the relationships of predictor variables with the response variable in a linear regression model. First, the interpretation of regression coefficients as marginal effects is invalid with strong collinearity, as it is not possible to hold highly correlated variables constant while increasing another correlated variable one unit. In addition, collinearity can make regression coefficients unstable and very sensitive to change. This is typically manifested in a large change in magnitude or even a reversal in sign in one regression coefficient after another predictor variable is added to the model or specific observations are excluded from the model. It is especially important for inference that a possible consequence of collinearity is a sign for a regression coefficient that is counterintuitive or counter to previous research. The instability of estimates is also realized in very large or inflated standard errors of the regression coefficients. The fact that these inflated standard errors are used in significance tests of the regression coefficients leads to conclusions of insignificance of regression coefficients, even, at times, in the case of important predictor variables. In contrast to inference on the regression coefficients, collinearity does not impact the overall fit of the model to the observed response variable data.

Diagnosing Collinearity

There are several commonly used exploratory tools to diagnose potential collinearity in a regression model. The numerical instabilities in analysis caused by collinearity among regression model variables lead to correlation between the estimated regression coefficients, so some techniques assess the level of correlation in both the predictor variables and the coefficients. Coefficients of correlation between pairs of predictor variables are statistical measures of the strength of association between variables. Scatterplots of the values of pairs of predictor variables provide a visual description of the correlation among variables, and these tools are used frequently. There are, however, more direct ways to assess collinearity in a regression model by inspecting the model output itself. One way to do so is through coefficients of correlation of pairs of estimated regression coefficients. These statistical summary measures allow one to assess the level of correlation among different pairs of covariate effects as well as the correlation between covariate effects and the intercept. Another way to diagnose collinearity is through variance inflation factors, which measure the amount of increase in the estimated variances of regression coefficients compared with when predictor variables are uncorrelated. Drawbacks of the variance inflation factor as a collinearity diagnostic tool are that it does not illuminate the nature of the collinearity, which is problematic if the collinearity is between more than two variables, and it does not consider collinearity with the intercept. A diagnostic tool that accounts for these issues consists of variance-decomposition proportions of the regression coefficient variance–covariance matrix and the condition index of the matrix of the predictor variables and constant term. Some less formal diagnostics of collinearity that are commonly used are a counterintuitive sign in a regression coefficient, a relatively large change in value for a regression coefficient after another predictor variable is added to the model, and a relatively large standard error for a regression coefficient. Given that statistical inference on regression coefficients is typically a primary concern in regression analysis, it is important for one to apply diagnostic tools in a regression analysis before interpreting the regression coefficients, as the effects of collinearity could go unnoticed without a proper diagnostic analysis.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading