Skip to main content icon/video/no-internet

Panel data consist of a cross-section of “individuals” for which there are repeated observations over time. Individuals can be any cross-sectional unit of analysis, such as states, dyads, or survey respondents. Panel data sets are typically dichotomized between long panels, which have many measurement occasions relative to the size of the cross section, and short panels, which have many individuals in the cross section relative to the number of repeated measurement occasions, or “waves.” In general, the methods associated with the term panel data analysis or longitudinal analysis focus on short panels, while methods under the time-series cross section umbrella focus more on analyzing long panels. The key advantage of panel data is that such data offer the opportunity to better evaluate causal propositions than strictly cross-sectional data. Whereas cross-sectional data only allow the researcher to observe covariances, panel data further allow the researcher to observe whether a change in an input precedes a change in the outcome. In other words, since panel data consist of the same individuals over time, the analyst can observe a shift in responses as a reaction to an input. One example would be an evaluation of whether a state's present behavior responds to the prior behavior of its neighbors. Another example might be using a survey panel, such as those often incorporated into the American National Election Studies, to assess how partisan strength influences campaign interest over time. Panel data have a number of features that can pose challenges to analysts. These issues include unit effects, serial correlation, heteroskedasticity, and contemporaneous correlation. Panels also have special problems of missingness. The remainder of this entry focuses on these issues and some remedies for each.

Unit Effects

Whenever individuals' mean responses differ, unit effects are present in the data. Unit effects can pose serious problems for inference as failure to account for them in some way can produce bias in estimates akin to omitted variable bias. If the mean response varies cross-sectionally via unobserved unique means, but this difference is not modeled (and thereby left in the error term), then any cross-sectionally varying covariate will correlate with the error term. Such a situation produces endogeneity bias, that is, the independent variable is correlated with the error term in the model's coefficients.

In the econometric tradition, two approaches are widely used to handle unit effects. One is the fixed-effects model, typically estimated with least squares dummy variables (LSDV). This approach estimates the desired model using ordinary least squares (OLS), including dummy variables for each individual, save a reference individual. This approach has the advantage of being computationally simple and accounting for a known source of variance in the model specification. However, individual dummies are perfectly collinear with any variable that varies only cross-sectionally. Hence, LSDV precludes the inclusion of time-invariant variables in a model. An alternative that does allow time-invariant covariates is a generalized least squares (GLS) model with a compound symmetry covariance structure, known as a random-effects model. This model recognizes that repeated observations will covary, so the estimator accounts for this structure by including a term that forces all repeated observations to correlate at a constant level with each other.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading