Skip to main content icon/video/no-internet

Both censoring and truncation refer to situations in which empirical data on random variables are incomplete or partial. These terms have not been consistently used throughout the existing literature, and what one author calls censoring, another may term truncation. Consequently, an author's definition of one of these terms always requires careful attention.

Usually, censoring refers to a situation in which some values of a random variable are recorded as lying within a certain range and are not measured exactly, for at least some members of a sample. In contrast, truncation customarily refers to a situation in which information is not recorded for some members of the population when a random variable's values are within a certain range. Thus, censoring is associated with having incomplete or inexact measurements, whereas truncation is associated with having an incomplete sample, that is, a sample chosen conditional on values of the random variable. Hald (1952) is one of the first authors to distinguish between censoring and truncation. Since then, statistical methods for analyzing censored variables and truncated samples have been developed extensively (Schneider, 1986).

A censored variable is one in which some observations in a sample are measured inexactly because some values of the variable lie in a certain range or ranges. A censored observation or case refers to a sample member for which a particular variable is censored (i.e., known only to lie in a certain range).

For example, everyone whose actual age was reported as 90 years or older in the 1990 U.S. Census of Population has been censored at age 90 in the census data made publicly available. The U.S. Census Bureau decided to censor age at 90 to ensure confidentiality and to avoid inaccurate information (because very old people sometimes exaggerate their age). Any person whose actual age was reported to be 90 to 120 on the household census form appears in the public micro-level data as a case with age censored at 90.

A truncated variable is one in which observations are measured only for sample members if the variable's value lies in a certain range or ranges; equivalently, it is a variable for which there are no measurements at all if the variable's value does not fall within selected ranges. Conceptually, a truncated sample is closely related to a sample that has been trimmed to eliminate extreme values that are suspected of being outliers. One difference between truncation and trimming is that samples are customarily truncated on the basis of a variable's numerical values but trimmed on the basis of a variable's order statistics (e.g., the top and bottom five percentiles are excluded, even though they exist in the data). For example, very short people are typically excluded from military service. As a result, the height of a sample of military personnel is a truncated variable because people in the entire population are systematically excluded from the military if their height is below a certain level.

Types of Censoring

There are different ways of classifying censoring and censored variables. One basic distinction is between random censoring and censoring on the basis of some predetermined criteria. The two common predetermined criteria used to censor variables are called Type I and Type II censoring. (A few authors also refer to Types III and IV censoring, but the definitions of other types are less consistent from author to author.)

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading