Skip to main content icon/video/no-internet

The Kaplan-Meier (or product limit) estimator ŝ(t) is a nonparametric (or distribution free) estimator of a survival distribution S(t). It was derived by Kaplan and Meier in 1958 as a direct generalization of the sample survivor function in presence of censored data.

In clinical applications, the Kaplan-Meier method is very often used to estimate the probability of dying from specific causes or the probability of occurrence or recurrence of a disease. In general, the KaplanMeier method can be used to estimate the probability of occurrence of any event.

The Kaplan-Meier method is generally used to summarize the survival experience of groups of individuals in terms of the empirical survivor function.

Typically, not all individuals under study fail during the observation period. Some individuals may leave the study early while still alive, and some other individuals may finish the study alive. These individuals are called censored.

The Kaplan-Meier estimator does not require any assumptions about the functional form of the distribution of failures and accounts for censored observations. For small data set, the Kaplan-Meier curve can be easily calculated by hand. Most statistical software contains routines for the calculation of the Kaplan-Meier estimator.

Consider a sample of N individuals who are followed up in time prospectively. During the observation period, suppose that K of these individuals die. We also assume that NK individuals are censored.

Let t1t2 ≤ ≤ tK be the ordered failure times for the K individuals who die during the observation period.

To construct the Kaplan-Meier estimator of the survival distribution, we start by dividing the observation period into small intervals

None

each one corresponding to the survival time of the noncensored individuals. For each interval Ij(j = 1, …, K),

  • dj = the number of individuals who die in the interval Ij;
  • cj = the number of individuals censored in the interval Ij;
  • rj = the number of individuals who are alive and at risk at the beginning of the interval; and
  • hj = the hazard of failure, or the conditional probability of an individual surviving through Ij, given that he was alive at the beginning of Ij; this quantity can be well approximated by ĥj = dj/rj, the ratio of number of failures over the number of individuals at risk during the interval Ij.

The observed proportion of failures d1j/r1j represent an estimate of the hazard of failure (or instantaneous failure rate) h(t).

At the beginning of the observation period, t0, all individuals are alive, so that, d0 = 0 and r0 = N. At each step, we calculate rj = rj − 1 − dj − 1 − cj − 1 to update the number of individuals at risk.

The Kaplan-Meier estimate of the survival distribution S(t) is obtained by the product of all the 1 − ĥj

None

The Kaplan-Meier estimate ŝ(t) is a left continuous, not increasing, step function that is discontinuous at the observed failure times tj. The intervals Ij may vary in length and depend on the observed data. Observations that are censored at tj are assumed to occur after tj. Censored observations contribute to the risk set till the time they are last seen alive. If a failure and a censoring time are tied (i.e., occur at the same point in time), we assume that the failure occurs just before the censoring.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading