Skip to main content icon/video/no-internet

Kolmogorov-Smirnov Test for Two Samples

The two-sample Kolmogorov-Smirnov test is designed to test the hypothesis that two independent groups have identical distributions. A possible appeal of the method is that it can be sensitive to differences between groups that might be routinely missed when using means, medians, or any single measure of location. For example, it might detect differences in the variances or the amount of skewness. More generally, it can detect differences between percentiles that might be missed with many alternative methods for comparing groups. Another positive feature is that it forms the basis of a graphical method for characterizing how groups differ over all the percentiles. That is, it provides an approach to assessing effect size that reveals details missed by other commonly used techniques. Moreover, the test is distribution-free, meaning that assuming random sampling only, the probability of a Type I error can be determined exactly based on the sample sizes used. Historically, the test has been described as assuming that distributions are continuous. More precisely, assuming that tied values occur with probability zero, a recursive method for determining the exact probability of a Type I error is available. But more recently, a method that allows tied values was derived by Schroër and Trenkler.

The details are as follows. Let X1,…, Xn be a random sample from the first group and Y1,…, Ym be a random sample from the second. Let IXix = 1 if Xix, otherwise IXix = 0. F1 is estimated with

None
the proportion of observations less than or equal to x, and F2 is estimated in a similar manner. The null hypothesis is

None
versus

None

The test statistic is based on what is sometimes called the Kolmogorov distance, which is just the maximum absolute difference between the two distributions under consideration. For convenience, let Z1,…, ZN be the pooled observations where N = m + n. So the first mZ values correspond to X1,…, Xm. The test statistic is

None
the maximum being taken over all i = 1,… N.

A variation of the Kolmogorov-Smirnov test is sometimes suggested when there is interest in detecting differences in the tails of the distributions. Let M = nm/N, λ= n/N, and

None

Now, the difference between any two distributions, at the value x, is estimated with

None

Then the hypothesis of identical distributions can be tested with an estimate of the largest weighted difference over all possible values of x. The test statistic is

None
where again the maximum is taken over all values of i, i = 1,…, N, subject to Ĥ(Zi)[1−Ĥ(Zi)] > 0.

Simply rejecting the hypothesis of equal distributions is not very informative. A more interesting issue is where distributions differ and by how much. A useful advance is an extension of the Kolmogorov-Smirnov test that addresses this issue. In particular, it is possible to compute confidence intervals for the difference between all of the quantiles in a manner where the probability of at least one Type I error can be determined exactly.

Suppose c is chosen so that P(Dc) = 1 − α. Denote the order statistics by X(1) ≤ … ≤ X(n) and Y(1) ≤ … ≤ Y(m). For convenience, let X0 =−∞ and X(n+1) =∞. For any x satisfying X(i)x < X(i+1),

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading