Skip to main content icon/video/no-internet

Consider the LINEAR REGRESSION model y = + u, where y is a n-by-1 vector of observations on a dependent variable, X is a n-by-k matrix of independent variables of full column rank, β is a k-by-1 vector of parameters to be estimated, and u is a n-by-1 vector of disturbances. Via the GAUSS-MARKOV theorem, if

Assumption 1 (A1): E(u|X) = 0 (i.e., the disturbances have conditional mean zero), and

Assumption 2 (A2): E(uu|X) = σ2Ω, where Ω = In, an n-by-n identity matrix (i.e., conditional on the X, the disturbances are independent and identically distributed or “iid” with conditional variance σ2),

then the ORDINARY LEAST SQUARES (OLS) estimator β^OLS = (XX)−1Xy with variance-covariance matrix V(β^OLS2(XX)−1 is (a) the BEST LINEAR UNBIASED ESTIMATOR (BLUE) of β, in the sense of having smallest sampling variability in the class of linear unbiased estimators, and (b) a consistent estimator of β (i.e., as n → ∞, Pr[|β^OLS − β| < ∊] = 1, for any ∊ > 0, or plim β^OLS = β).

If A2 fails to hold (i.e., Ω is a positive definite matrix but not equal to In), then β^OLS remains unbiased, but no longer “best,” and remains consistent. Relying on β^OLS when A2 does not hold risks faulty inferences; without A2, σ^2(XX)−1 is a biased and inconsistent estimator of V(β^OLS), meaning that the estimated STANDARD ERRORS for β^OLS are wrong, risking invalid inferences and hypothesis tests. A2 often fails to hold in practice; for example, (a) pooling across disparate units often generates disturbances with different conditional variances (HETEROSKEDASTICITY), and (b) analysis of TIME-SERIES data often generates disturbances that are not conditionally independent (serially correlated disturbances).

When A2 does not hold, it may be possible to implement a generalized least squares (GLS) estimator that is BLUE (at least asymptotically). For instance, if the researcher knows the exact form of the departure from A2 (i.e., the researcher knows Ω), then the GLS estimator β^GLS = (XΩ−1X)−1XΩ−1y is BLUE, with variance-covariance matrix σ2(XΩ−1X)−1. Note that when A2 holds, Ω = In, and β^GLS = β^OLS (i.e., OLS is a special case of the more general estimator).

Typically, researchers do not possess exact knowledge of Ω, meaning that β^GLS is nonoperational, and an estimated or feasible generalized least squares (FGLS) estimator is used. FGLS estimators are often implemented in multiple steps: (a) an OLS analysis to yield estimated residuals Û; (b) an analysis of the Û to form an estimate of Ω, denoted Ω^; and (c) computation of the FGLS estimator β^FGLS = (XΩ^−1X)−1XΩ^−1y. The third step is often performed by noting that Ω^ can be decomposed as Ω^ = P−1(P)−1, and β^FGLS is obtained by running the WEIGHTED LEAST SQUARES (WLS) regression of y* = Py on X* = PX; that is, β^FGLS = (X*′X*)−1X*′y*.

The properties of FGLS estimators vary depending on the form of Ω (i.e., the nature of the departure from the conditional iid assumption in A2) and the quality of Ω^, and so they cannot be neatly summarized. Finite sample properties of the FGLS estimator are often derived case by case via MONTE CARLO experiments; in fact, it is possible to find cases in which β^OLS is more efficient than β^FGLS, say, when the violation of A2 is mild (e.g., Chipman, 1979; Rao & Griliches, 1969). ASYMPTOTIC results are more plentiful and usually rely on showing that β^FGLS and β^GLS are asymptotically equivalent, so that β^FGLS is a consistent and asymptotically efficient estimator of β (e.g., Amemiya, 1985, pp. 186–222; Judge, Griffiths, Hill, & Lee, 1980, pp. 117–118).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading