Skip to main content icon/video/no-internet

Kolmogorov-Smirnov Test for One Sample

Generally, Kolmogorov-Smirnov tests are aimed at testing the hypothesis that two or more distributions are identical. The one-sample version tests the hypothesis that observations were sampled from a specified distribution. For example, one could test the hypothesis that observations arise from a normal distribution having mean 3 and standard deviation 6. Or one could test the hypothesis that sampling is from a chi-squared distribution with 6 degrees of freedom. So the one-sample version does not test the hypothesis that observations follow some normal distribution having some unknown mean and variance; rather, it can be used to test the hypothesis that observations follow a precisely specified distribution. However, a simple extension of the method can be used to test the hypothesis that observations follow a normal distribution with unknown mean and variance. The two-sample version tests the hypothesis that two unknown distributions are identical. Certain advances make it a potentially useful method for getting a detailed description of how distributions differ that goes beyond any technique based on a single measure of location.

For the one-sample version considered here, let F0(x) = P(Xx) be the known (specified) distribution, and let X1,…, Xn be a random sample of size n from the unknown distribution F1(x). Letting IXix = 1 if Xix, otherwise IXix = 0, F1 is estimated with

None
the proportion of observations less than or equal to x.

The two-sided version is designed to test

None
versus

None

The test statistic is based on what is sometimes called the Kolmogorov distance, which is just the maximum absolute difference between the two distributions under consideration. More formally, the test statistic is

None
the maximum being taken over all i = 1,…, n.

There are one-sided versions of the test as well. The first tests

None
versus

None
The test statistic is

None
The other one-sided version tests

None
versus

None
The test statistic is

None

For all three versions, the null hypothesis is rejected if the test statistic is sufficiently large.

Using a recursive algorithm described by Conover, the probability of a Type I error can be determined exactly assuming random sampling only. For n > 40, an approximate critical value can be used, which is tabled by Conover when testing at the α level for α= .2, .1, .05, .02, and .01.

The following example illustrates the calculations and a situation where the test might have practical value. Imagine that 10 independent studies are performed. To be concrete, suppose these 10 tests are based on Student's T. Further imagine that p values from these studies are available, but the data used to compute the p values are not. For illustrative purposes, imagine the p values are .621, .503, .203, .477, .710, .581, .329, .480, .554, and .382. So, in particular, none of the tests is significant at the .05 level. The issue here is that if we assume that for each study, the groups being compared do not differ, is it the case that Student's T provides adequate control over the probability of a Type I error? If it does, and the groups do not differ, the p values follow a uniform distribution. The ability of Student's T to control the probability of a Type I error is a serious concern because many recent papers have demonstrated that practical problems can arise, even with fairly large sample sizes. Moreover, problems with controlling the probability of a Type I error can translate into poor power when using Student's T.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading