Skip to main content icon/video/no-internet

Survival analysis is a well-developed branch of statistics that concerns methods for the analysis of time-to-event data. Such data arise in a number of scientific fields, including medicine, biology, public health, epidemiology, engineering, economics, and demography, among others. Time-to-event data, sometimes also referred to as failure time data (in which case the event is regarded as a “failure”), have two unique features. First, the response of interest, time, is always nonnegative. Second, and more important, these data are often censored or truncated or both, making the survival analysis unique among most other statistical methods.

Survival analysis often models survival function and hazard function instead of the probability density function or cumulative distribution function. Throughout this discussion, we will assume that the time to event, denoted by T, is a nonnegative continuous random variable with a probability density function f(t) and that F(t) = P(Tt) =∫t0f (u) du is the corresponding cumulative distribution function. The survival function and the hazard function of T are then defined as S(t) = P (T > t) = 1–F(t) and λ(t) = limΔt→0P(t<Ttt \ T > t)/Δt, respectively. It can be easily shown that λ(t) = f(t)/S(t) and S(t) = exp{–Λ(t)} where Λ(t) =∫t0 λ(u) du is termed the cumulative hazard function. We see that the survival function gives the probability that an individual “survives” up to time t, while the hazard function represents the “instantaneous” probability that the “failure” will occur in the next moment, given that the individual has survived up to time t. Both the survival function and the hazard function can be used to characterize the stochastic behavior of the random variable T.

Censoring and Truncation

Time-to-event data present themselves in unique ways that create special obstacles in analyzing such data. The most important one is known as censoring. Loosely speaking, censoring means that the data are incomplete. Although there exist many censoring mechanisms, we shall mainly focus on the two most frequently encountered types of censoring: right censoring and interval censoring.

Table 1 presents a typical right-censored data set resulting from a clinical trial. The purpose of this clinical trial was to evaluate the efficacy of maintenance chemotherapy for acute myelogenous leukemia (AML). In all, 23 patients were treated by chemotherapy, which led to remission, and then randomly assigned to two groups. The first group contained 11 patients and received maintenance chemotherapy. The second group contained 12 patients and did not receive maintenance chemotherapy. Time until relapse, the response variable denoted by T hereafter, was recorded in weeks. The objective of the survival analysis was to see if the maintenance chemotherapy prolonged time until relapse. In later sections, we will answer this question by applying appropriate survival analysis methods.

In Group 1, the first and the second patients had relapse times of precisely 9 weeks and 13 weeks, respectively. That is, T1 = 9 and T2 = 13. The third patient, however, was not able to provide an exact relapse time, perhaps due to being dropped out of the study. All we know is that, at Week 13, this patient was still relapse free. Thus this patient's relapse time should be greater than 13 weeks, which is denoted by 13+ in the table. One can see that for this patient, we have only “incomplete” information because instead of knowing the exact value of T3, we know only that T3 > 13. Such censoring is called right censoring.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading