Skip to main content icon/video/no-internet

Multiple regression is the process of generating an equation using a combination of predictor variables to create an expected outcome. The equation appears as follows:

PredictedvalueofY=β1X1+β2X2+β3X3++βiXi.

The equation is true where the β represents the standardized coefficient for each variable X (this is in pairs so that each variable X has a separately estimated coefficient, β). The process involves the use of a combination of predictor variables (X) to estimate an outcome (the Y variable). This equation is considered a standard equation where the expression has been adjusted to remove the scale metric and does not use the raw weights (b). The prediction is based on the value using a z score estimate for each variable in the analysis (X and Y). Each coefficient used (βi) provides the unique contribution for each variable (removing the influence of the other variables in the equation). Unlike a raw equation, the standardized equation has no intercept (constant) because the constant is subtracted (and coefficients standardized) and the values go through the origin.

Defining Multiple R

Multiple R represents essentially the correlation between the predicted value of Y generated in the equation above and the actual value of Y for each unit. The assumption is that the combination of predictors will generate a multiple R or correlation that is larger than any single predictor. If a single predictor can predict or estimate the value of Y as efficiently as a combination of predictors then the justification to employ multiple regression does not exist.

Unlike the typical correlation coefficient that can range from −1.00 to 1.00, multiple R ranges from zero to 1.00. The possibility of negative effects or inverse relationships with predictor variables (X) is considered but introduced by having the regression equation standardized weight (β) represented as negative. So the negative weight has the influence of creating a subtraction for variables that are negatively correlated or inversely related to the outcome variable (Y). The effect of this is to make the multiple R theoretically have no negative values, one factor that distinguishes the multiple R from the normal bivariate correlation (r).

Assumptions of Multiple R

Multiple regression assumes that there exists no multicollinearity among the set of predictor variables (Xi) used in the equation. What happens if multicollinearity exists is that the value for the multiple R is still accurate; however, the contribution of the individual predictors (provided by the standardized regression coefficients, β) becomes inaccurate estimators of the relative contribution of any given predictor variable. The impact, however, on the estimation of multiple R remains unaffected. If the purpose of the use of the equation is the generation of an expected estimate to compare with the actual value, the equation remains useful. The impact of multicollinearity creates conditions where the determination of the relative contribution of any predictor variable (X) remains unclear and the relative size of the coefficients to determine degree of contribution remains unjustified.

A second assumption of multiple regression is that there exists no causal dependencies among the predictor variables (X). What this essentially means is that there exists no expectation that any predictor variable (X) is a cause of any other predictor variable (X). If any causal relationship is expected then the causal model assumptions for multiple regression are not met and the results become misleading. Multiple regression assumes that each predictor variable (X) is a separate cause or prediction of the dependent variable (Y). The requirement of each predictor variable (X) independently predicting the dependent variable (Y) is not met if causal dependencies exist among the predictor variables (X). The result generates an outcome that fails to represent or reflect accurately the underlying theoretical assumptions for the data. The assumption is not a mathematical assumption, the procedure will produce an outcome without any mathematical ability to recognize the problem. The challenge comes to the interpretation of the outcome because the interpretation of the equation and generated multiple R becomes inaccurate due to the inconsistency between the theoretical assumptions of the investigator and the actual mathematical model evaluated.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading