Skip to main content icon/video/no-internet

Both censored and truncated data involve a lack of information about a random variable and occur in the context of quantitative analysis of data, when one is using that variable either to estimate a population mean (or other population parameters) or as the dependent variable in a regression analysis. The key distinction between them is whether one has information about missing values. With censored data, one observes some information about the missing data, either in the form of a range of values that they might fall into or in the form of the knowledge that they are missing. With truncated data, one has no information about the existence or value of missing observations. The difference in the structure of information for two types of data determines how one approaches censored or truncated data, whether in the context of a single random variable or as a dependent variable in a regression analysis. This entry discusses the consequences of this kind of missing data for regression analysis and sample selection.

Concerns about censoring and truncation abound in empirical analysis. They can occur either through the structure of data-gathering efforts or through legal requirements. Historically, researchers relied on assumptions about the distribution of the variable to adjust the estimates to account for truncation or censoring. These concerns are just as important when censored or truncated variables are the dependent variables in regression analyses. Scholars have long worried about the consequences of processes such as self-selection for the validity of their regression results, since they often result in biased and inconsistent coefficient estimates. While estimators to correct for censoring and truncation have been around for more than 3 decades, researchers have worried about their sensitivity to distributional assumptions and model specification. While recent work attempts to relax some of these critical assumptions and diagnose sensitivity issues, researchers have also extended previous work by designing estimators for a greater variety of data.

Types of Missing Data

Censored and truncated data are both forms of missing data, which have been categorized in three ways: (1) missing at random (MAR), (2) missing completely at random (MCAR), and (3) nonignorable (NI). A variable that is MCAR has missing values that are determined randomly, so that they occur with equal probability and do not depend on any information in the data set. A variable that is MAR exhibits a pattern of missingness in which the probability of a missing value depends on other observed variables for that same observation. A variable exhibits NI missingness when unobserved information in its value helps explain its missing-ness. In this case, the pattern of missingness depends on information beyond that which is contained in the observed variables.

Censored and truncated data can emerge through any of these three forms of missing data. Making valid inferences requires making valid assumptions about the structure of the missing-ness. Except for the case of MCAR, one will generally reach inaccurate conclusions, whether regarding population characteristics or regression parameters, unless one properly models the pattern of missingness.

Censored and Truncated Random Variables

A random variable is censored if one does not observe its true value but rather observes a bound for the range of values into which it falls. If X is a random variable, then X is censored if when X > b the researcher observes only b. Similarly, X may be censored from below by a value a. The density function of the observed values of X therefore has point masses at the values a and b. This form of censoring occurs commonly in political science in the study of the duration of political events. For example, if a researcher observes a set of states that might adopt a particular policy, then states that have not adopted the said policy by the end of the study period are said to be right censored. For those states, the researcher does not know exactly how many years will elapse before they adopt the policy, only that it exceeds the number of years in the study period so far.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading