Skip to main content icon/video/no-internet

Regression analysis is the blanket name for a family of data analysis techniques that examine relationships between variables. The techniques allow survey researchers to answer questions about associations between different variables of interest. For example, how much do political party identification and Internet usage affect the likelihood of voting for a particular candidate? Or how much do education-related variables (e.g. grade point average, intrinsic motivation, classes taken, and school quality) and demographic variables (e.g. age, gender, race, and family income) affect standardized test performance? Regression allows surveyors to simultaneously look at the influence of several independent variables on a dependent variable. In other words, instead of having to calculate separate tables or tests to determine the effect of demographic and educational variables on test scores, researchers can examine all of their effects in one comprehensive analysis.

Regression also allows researchers to statistically “control” for the effects of other variables and eliminate spurious relationships. In a more serious case, a case of noncausal covariation, two variables may be highly related but may not have a direct causal relationship. For example, in cities in the United States, murder rates are highly correlated with ice cream sales. This does not mean, however, that if the selling of ice cream is curtailed that the murder rate will go down. Both ice cream sales and murder rates are related to temperature. When it gets hot out, people buy more ice cream and commit more murders. In a regression equation, both ice cream sales and temperature can be included as predictors of murder rates, and the results would show that when temperature is controlled for, there is no relationship between ice cream sales and murder rates.

This ability to control for other variables makes arguments based on research results much stronger. For example, imagine that a test score regression showed that the more English classes a school required, the better their students did on standardized tests, controlling for median family income, school quality, and other important variables. Policy advocates can then propose increasing the required English courses without being as open to the criticism that the results were really due to other causes (such as socio-economic status).

The regression approach can also simultaneously look at the influence of different important variables. For example, imagine that the head reference librarian and the head of acquisitions for a library disagree about whether it is customer service or having the most up-to-date bestsellers that influences patron satisfaction. A regression predicting patron satisfaction from both customer service ratings and percentage of recent bestsellers can answer the question of which one (or both or neither) of these factors influences customer service. Researchers can even look at interactions between the variables. In other words, they can determine if the effect of customer service on patron satisfaction is bigger or smaller at libraries with fewer bestsellers than those with more bestsellers.

At its base, the linear regression approach attempts to estimate the following equation:

None

where y is the dependent variable; x1, x2xn are the independent variables; e is the error in prediction; and b1, b2bn are the regression coefficients. The regression coefficients are estimated in the model by finding the regression lines that simultaneously best minimize the squared errors of prediction (i.e. the sums of squares). If a dependent variable, controlling for the effects of the other dependent variables, has a large enough relationship with the independent variable, then the regression coefficient will be significantly different from zero. Regression coefficients can be interpreted as partial slopes; in other words, the regression coefficient indicates that for each one-unit increase in the independent variable (and controlling for the effects of the other independent variables), the dependent variable increases or decreases by the amount of the regression coefficient.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading