Skip to main content icon/video/no-internet

Regression analysis is, by far, the most commonly used statistical technique in political science. While regression analysis can be defined in many different ways, it is a tool for describing the relationships among variables. Regression is generally used for two complementary purposes. First, it measures the effects of one or more independent variables on a single dependent variable. Second, it can be used to predict the values of a dependent variable that can be expected to occur at specified levels of the independent variables. In the former capacity, regression is a useful tool for theory testing. In the latter capacity, regression is a forecasting tool that is useful for decision making. These two uses are closely connected. Theories are evaluated by their ability to predict as yet unobserved phenomena, and forecasting is carried out most effectively when predictions are based on a substantive theory. In the following, the major features and applications of regression analysis are discussed.

If there is one independent variable, the analysis is commonly called a simple or bivariate regression. If there is more than one independent variable, it is called multiple regression analysis. In traditional nomenclature, the dependent variable is generically represented as Y and the independent variable is designated as X. If there are multiple independent variables, the Xs are given subscripts, say X1, X2, …, Xj, …, Xk for k variables (note that the specific order of the Xjs generally does not matter). For the moment, we will assume that all variables are relatively continuous and measured at the interval level.

The immediate objective of a regression analysis is to show how the conditional distribution of Y varies across the values of X (or across the values of the Xjs in a multiple regression). Attention is usually focused on the conditional mean of Y rather than the entire conditional distribution. If the conditional Y distributions (and, specifically, the conditional Y means) do, in fact, differ systematically across the X values, then Y is said to be related to X. If the conditional Y distributions do not vary across the X values, then X and Y are unrelated to, or independent of, each other.

Bivariate Regression

There are several ways in which a regression analysis can be carried out. For example, a graphical approach would simply plot Y against X and superimpose a curve that traces out the conditional Y means across the range of X. While this strategy is often useful for exploring bivariate data, it becomes much more difficult in the multiple regression case. And there are some practical limitations in using a graph as the final output of the analysis in any context. Further, the sample estimates of the conditional Y means are likely to be somewhat unreliable and unstable since there are usually only a small number of observations at each distinct value of X.

An alternative approach would be to specify a function that describes the hypothesized relationship between Y and the Xjs. The empirical analysis then estimates the parameters of this function and determines how well the function actually represents the data at hand. Probably the simplest function for relating the variables is linear in form. In the bivariate case, a linear function would be shown as

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading