Imputation

Paul J.Lavrakas

doi:10.4135/9781412963947

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Imputation

Edited by:
Paul J. Lavrakas
In:Encyclopedia of Survey Research Methods
Chapter DOI:https://doi.org/10.4135/9781412963947.n216
Subject:Survey Research
Keywords:missing data

Request Permissions

Show page numbers Hide page numbers

Imputation, also called ascription, is a statistical process that statisticians, survey researchers, and other scientists use to replace data that are missing from a data set due to item nonresponse. Researchers do imputation to improve the accuracy of their data sets.

Missing data are a common problem with most databases, and there are several approaches for handling this problem. Imputation fills in missing values, and the resultant completed data set is then analyzed as if it were complete. Multiple imputation is a method for reflecting the added uncertainty due to the fact that imputed values are not actual values, and yet still [Page 323]allow the idea of complete-data methods to analyze each data set completed by imputation. In general, multiple imputation can lead to valid inferences from imputed data. Valid inferences are those that satisfy three frequentist criteria:

1.
Approximately unbiased estimates of population estimands (e.g. means, correlation coefficients)
2.
Interval estimates with at least their nominal coverage (e.g. 95% intervals for a population mean should cover the true population mean at least 95% of the time)
3.
Tests of significance that should reject at their nominal level or less frequently when the null hypothesis is true (e.g. a 5% test of a zero population correlation that should reject at most 5% of the time when the population correlation is zero)

Among valid procedures, those that give the shortest intervals or most powerful tests are preferable.

Missing-Data Mechanisms and Ignorability

Missing-data mechanisms were formalized by Donald B. Rubin in the mid-1970s, and subsequent statistical literature distinguishes three cases: (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) not missing at random (NMAR). This terminology is consistent with much older terminology in classical experimental design for completely randomized, randomized, and not randomized studies. Letting Y be the N (units) by P (variables) matrix of complete data and R be the N by P matrix of indicator variables for observed and missing values in Y, the missing data mechanism gives the probability of R given Y and possible parameters governing this process, ξ: p(R|Y, ξ).

MCAR

Here, “missingness” does not depend on any data values, missing or observed: p(R|Y,ξ,)= p(R|ξ,). MCAR can be unrealistically restrictive and can be contradicted by the observed data, for example, when men are observed to have a higher rate of missing data on post-operative blood pressure than are women.

MAR

Missingness, in this case, depends only on observed values, not on any missing values: p(R|Y,ξ) = p(R|Yobs,ξ,), where Yobs are observed values in Y, Y =(Yobs, Ymis), with Ymis the missing values in Y. Thus, if the value of blood pressure at the end of a clinical trial is more likely to be missing when some previously observed values of blood pressure are high, and given these, the probability of missingness is independent of the missing value of blood pressure at the end of the trial, the missingness mechanism is MAR.

NMAR

If, even given the observed values, missingness still depends on data values that are missing, the missing data are NMAR: p(R|Y,ξ)≠p(R|Yobs, ξ). This could be the case, for example, if people with higher final blood pressure tend to be more likely to be missing this value than people with lower final blood pressure, even though they have the exact same observed values of race, education, and all previous blood pressure measurements. The richer the data set is in terms of observed variables, the more plausible the MAR assumption becomes.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Imputation

Missing-Data Mechanisms and Ignorability

MCAR

MAR

NMAR

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends