Skip to main content icon/video/no-internet

Misspecification is a fundamental problem in empirical modeling. The origins of this problem can be found in the theoretical exercise of using a statistical model from a sample to make inferences about an unobservable population of interest. Any deviation from the true population model in the sample model means that the sample model is misspecified. This, in turn, means that the inferences from the sample model about the population are suspect.

The importance of problems of misspecification is underscored by the amount of attention paid to different types of misspecification in introductory texts on ordinary least squares (OLS) regression models. As an example, consider Damodar Gujarati's widely used textbook, Basic Econometrics. Gujarati's treatment of OLS is centered around 10 assumptions of the linear regression model. Six of these 10 assumptions are statements that the model does not contain one or more types of misspecification.

Discussed throughout this entry is misspecfication in OLS. The logic and importance of misspecification in OLS extends directly to other, more complicated types of models. Almost all such models contain some equation that indicates the relationship between the independent variables and the dependent variable is dictated. Getting this equation wrong means that the model is misspecified.

As the Gujarati example shows, misspecification can take on many different forms. In the sections that follow are discussions of several of the most common forms of misspecification, the detection of statistical problems caused by misspecification, and strategies for avoiding misspecification.

Types of Misspecification

The purpose of empirical model specification is to try to develop an accurate model of relationships in an unobservable population with observed sample data. In OLS, we can represent the population regression model as

None

and the sample regression model as

None

where Y is the dependent variable; X and Z are the independent variables, None, None1; andNone2 are sample estimates of the population parameters α; β1; and β2; andNonei is the sample estimate of the population stochastic term εi.

If any element of the population model is not appropriately represented in the sample regression model, then the inferences about the population model from the sample model may be problematic. The entry now turns to more in-depth discussions of four of the most common forms of misspecification.

Omitted Variable Bias

One of the most common critiques of empirical work is that the authors have left a relevant independent variable out of their model specification. This problem associated with this critique is known as “omitted variable bias.” For example, if the population regression model is

None

but the sample regression model is specified as

None

then there is a strong possibility that the sample model is prone to omitted variable bias. To illustrate the nature of this problem, consider what happens to the parameter estimate for the effect of X on Y, None1: In summation notation, the OLS formula for this parameter estimate is

None

and because one of the properties of OLS is that the resulting regression line (or plane in the case of a model with two independent variables) goes through the mean of each variable, we know that

None

we can see that the expected value ofNone1, E(None1), under these circumstances

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading