Skip to main content icon/video/no-internet

The use of quantitative methods in political science generally means the application of a statistical model to political science data, and a statistical model is simply a set of compatible probabilistic assumptions. Fundamentally, assumptions are modeling choices made by a researcher concerning the distribution of the data to be modeled, how the parameters of that distribution change over observations or time, and the dependence of one observation on another. The assumptions serve the dual purpose of reducing the number of parameters in the model that must be estimated and imbuing potential estimators with certain properties. The goals of the modeling process are description and inference, and how well a model accomplishes these goals is a direct function of how appropriate its assumptions are for a particular data set.

The structure of the entry proceeds as follows. First, the relationship between assumptions, models, and estimators is discussed. Second, the assumptions of the linear regression model are discussed in detail as more complex models are often defined as departures from this set of assumptions. Particular attention is paid to the assumptions necessary for attaining correct coefficient estimates and standard errors. Threats to these core assumptions are assessed in terms of their effects, both in theory and in practice. A brief discussion of possible remedies follows. The final sections discuss the degree to which the assumptions of the linear regression model are modified for use in the generalized linear model (GLM) and how the assumption of a random sample is dealt with in a discipline where random samples are rare.

Assumptions, Models, and Estimators

A statistical model is a mathematical representation of the actual process in the world that generated the data (known as a data-generating process, or DGP). The point of creating a statistical model is both to describe the data produced by the DGP and to make inferences about features of the DGP that are unknown. As noted above, the model itself consists of a set of assumptions. This account makes it clear that assumptions are characteristics of models and not characteristics of data, and the question to ask of a model is not whether it is true or false but how descriptively useful is it. Models are more or less useful in describing a given data set, and when an assumption fails to be useful in the process of description, the error lies with the model, not with the data.

Once a model has been selected, then an estimator (a function of the sample data that provides an estimated value for an unknown parameter) is chosen. Many common models can be estimated by any number of estimators, and the choice among estimators is driven by the model's assumptions, which imbue the estimators with various properties. An estimator with good (or better) properties under the assumptions of the model is chosen over an estimator with bad (or worse) properties under the assumptions of the model. The reference to “good estimators” and not to “good estimates” is by design. In frequentist statistics, a good estimate, by definition, is one produced by a good estimator. Properties of estimators that are considered good include unbiasedness (the estimator is on average neither high nor low), efficiency (the estimator has a small variance around the true value), and consistency (the estimator is near the true value almost all the time when the sample size is large). Often these properties are assessed in the aggregate, as a slightly biased estimator with a small variance may be preferred over an unbiased estimator with a larger variance.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading