Skip to main content icon/video/no-internet

The median (θ), the point on a scale below which 50% of the observations fall, is an ancient but commonly used measure of central tendency or location parameter of a population. The sample median can be written as

None

where i = [(n+1)/2] and k = {(n + 1)/2} are the whole and decimal portions of the (n + 1)/2, respectively.

The sample median, however, suffers from several limitations. First, its sampling distribution is intractable, which precludes straightforward development of an inferential statistic based on a sample median.

Second, the sample median lacks one of the fundamental niceties of any sample statistic. It is not the best unbiased estimate of the population median. Indeed, a potentially infinite number of sample statistics may more closely estimate the population median.

One of the most commonly used competitors of the sample median is the Harrell-Davis estimator, from 1982, which is based on Maritz and Jarrett, from 1978. Let X = (X1,…, Xn) be a random sample of size n and X~ = (X(1),…, X(n) be its order statistics (X(1) ≤ … ≤ X(n)). The estimator for pth population quantile takes the form of a weighted sum of order statistics with the weights based on incomplete beta function:

None

where the weights WHDn,i can be expressed as

None

where i = 1,…, n.

An interesting property of Equation 3 is that the resulting beta deviates represent the approximation of the probability that the ith-order statistic is the value of the population median. However, that observation is irrelevant to the task of finding the best estimate of the population median (or any specific quantile). In other words, this observation neither proves that the Harrell-Davis is the best estimator nor precludes the possibility that other multipliers may be substituted for Equation 3 in Equation 2 that produce a closer estimate of the population median.

A new competitor was recently proposed by Shulkin and Sawilowsky, in 2006, which is based on a modified double-exponential distribution. Calculate the weights Wn,iAltExp in the following form:

None

The weights in Equation 4 can be interpreted as the probability that a random variable falls between −n/3 + 2(i − 1)/3 and −n/3 + 2i/3. The modified form of the Laplace distribution used here was obtained through a series of Monte Carlo minimization studies. The estimate is calculated as a weighted sum,

None

There are two ways to judge which competitor is superior in estimating the population median regardless of distribution or sample size. One benchmark is the smallest root mean square error from the population median. Another is the closeness to the population median.

Let MP be the population median. Let NR be a number of Monte-Carlo repetitions and Mji be the median estimate by the jth method in ith repetition, j = 1,…, NM. Here NM is the number of methods. Then, mean square error (MSE) can be defined as follows:

None

Further, calculate deviation of each estimate from the population median:

None

For each i = 1,…, NR. find a set of indexes I(j), j = 1,…, NM, such that

None

The rank-based error (RBE) can now be defined as

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading