Skip to main content icon/video/no-internet

Hot-Deck Imputation

Hot-deck imputation is a popular and widely used imputation method to handle missing data. The method involves filling in missing data on variables of interest from nonrespondents (or recipients) using observed values from respondents (i.e. donors) within the same survey data set. Hot-deck imputation can be applied to missing data caused by either failure to participate in a survey (i.e. unit nonresponse) or failure to respond to certain survey questions (i.e. item non-response). The term hot deck, in contrast with cold deck, dates back to the storage of data on punch cards. It indicates that the donors and the recipients are from the same data set; the stack of cards was “hot” because it was currently being processed (i.e. run through the card reader quickly, which heated the punch cards). Cold-deck imputation, by contrast, selects donors from external data sets.

This entry describes the various types of hot-deck imputation: sequential, hierarchical, and nearest neighbor. This entry then discusses the assumptions underlying these methods and reviews the advantages and disadvantages of hot-deck imputation.

Sequential Hot-Deck Imputation

The basic idea behind hot-deck imputation is to match a recipient to a donor with similar characteristics and then transfer the donor's value to the recipient. There are various methods to match a recipient to a donor. The traditional hot-deck procedures begin with the specification of imputation classes constructed with auxiliary variables that are observed or known for both respondents and nonrespondents. Within each imputation class, the first nonmissing value (or record) is assigned as the potential donor. Each subsequent record is then compared to that potential donor; if the record has a nonmissing value, it replaces the potential donor. But if the record has a missing value, the most recent donor value is filled in. This is also called sequential hot-deck imputation.

A simple example explains this procedure. Given a sample of respondents and nonrespondents, the values on variable y are either observed or missing. If gender is known for all respondents and nonrespondents, two imputation classes can be constructed. The sequential hot-deck imputation procedure continually stores and replaces potential donor values from each nonmissing record. If a missing value on the y variable is found, the most recent donor value is then transferred to that nonrespondent.

The sequential hot-deck imputation is similar to the random imputation within-class method when donors are randomly selected with replacement. If the data set to be imputed has no inherent order (i.e. the records in the data file are random), the two procedures are essentially equivalent except for the start-up process. If the data set does have an inherent order, the sequential hot-deck imputation benefits from the positive correlation between donors and recipients. This benefit, however, is unlikely to be substantial.

The advantage of the sequential hot-deck imputation is that all imputations are made from a single pass of the data. However, a problem occurs when the imputation class does not contain an adequate number of donors. An imputation class with too few donors will cause the same donor values to be used repeatedly, creating spikes in univariate distribution of the variables of interest and resulting in a loss of precision in the survey estimates.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading