Skip to main content icon/video/no-internet

Missing data problems arise frequently in social science applications. Household survey respondents may fail to provide responses to questions for reasons of refusal, fatigue, enumerator error or perhaps a split sample design is used, in which case the missingness is an intentional strategy to conserve resources. Personal income is often missing in surveys due to respondents' refusal to reveal what they earn. Cross-national data on macroeconomic conditions may fail to include entries for some countries in some years due to crises, lack of capacity to generate data regularly, or ex post realization that past reporting methods were flawed. Measures of national income inequality are often missing for some countries in some years in cross-national data because their calculation requires an income survey; such surveys are too expensive for some countries to implement every year. In the household survey example, the analyst must decide whether and how to use data from respondents who refused to state their income. In the cross-national example, the same issue arises for country-years for which the inequality measure is missing. These are instances of what analysts may label as “item missingness.” This is the focus of this entry.

The “items” are the individual survey questions or bits of information. The idea is that for each household or country, analysts have information on some items but not for others. This is to be distinguished from “unit missingness,” where “units” refer to units of observation—the households or country-years in the examples above. That is, unit nonresponse refers to the situation in which data collection failed to accumulate any data for some units, for example, some households were unreachable or refused to participate altogether, or some country-years contained no usable data on any variables. Fixes for unit nonresponse in surveys include ex post weighting (via post-stratification) to externally available population information, such as a census. For cross-national data, the analyst typically has no way to adjust for unit missingness, and so he or she is required to state that the results hold only for the subpopulation for which data are available.

Listwise Deletion

By far the most common approach to missing data in political science has been to simply drop incomplete cases, something known as “listwise deletion” in the methodological literature. This is tantamount to taking cases of item missingness to be instances of unit missingness. In special circumstances, ignoring cases exhibiting item or unit missingness is a perfectly reasonable thing to do. For a unit i, consider an outcome variable—call it yi—and a set of predictor variables recorded for the unit—call them xi, collectively. Suppose that the distribution of yi depends on xi in a manner characterized by a probability density function f(˙). In that case, the analyst can write the probability density of yi given xi as f(yi|xi). An example would be a linear structural model of the form yi = β0 + β1xi1 + β2xi2 + εi, with εiN(0, σ2). In this case, xi = (xi1, xi2), and

None

The goal of data analysis in such a circumstance is to estimate the parameters that characterize f(yi|xi)—here, the β coefficients and the variance, σ2, of the error term, εi. Listwise deletion allows the analyst to do so without bias as long as two conditions hold: (1) the analyst has correctly specified the functional form of f(yi|xi), and (2) the probability that an item for i is missing systematically depends only on the elements of xi that are always observed for all units.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading