Skip to main content icon/video/no-internet

Welch's t Test

Mean comparison is the central theme of many classical statistical procedures. The well-known independent-sample t test is often used to test the equality of two means from independent populations with equal variances, whereas Welch's t test is generally preferred when the variances are not equal.

Let y11, y21, …, yn11 and y12, y22, …, yn22 be two independent random samples from two populations with means (or expected values) μj = E(yij) and variances σ2j = Var(yij), j = 1,2. The sample counterparts of μj and σ2j are

None
, respectively. Because
None
j is an unbiased estimator of μj, any difference between μ1 and μ2 should be reflected by
None
1
None
2. Of course, even when the null hypothesis
None

holds,

None
1 =
None
2 does not hold in general because of the sampling error, which is characterized by the variance
None

Notice that, when

None
will be simplified to
None
which is best estimated by
None

where the pooled variance estimator s2p is given by

None

The commonly used independent-sample t statistic,

None

is just the standardized mean difference using

None
0. As obtained by William S. Gosset in 1908, the ts in Equation 4 follows t (n1 + n2 − 2), the Student's t distribution with n1 + n2 − 2 degrees of freedom. It is commonly denoted by
None

where “~” stands for “following the distribution of.”

When σ21 ≠ σ22,

None
20 is no longer an unbiased estimator of v2, but
None
remains unbiased. A natural choice is to standardize
None
1
None
2 using
None
, which leads to the so-called Welch's t statistic
None

The exact distribution of tw can be characterized by a series or an integral, and it is far more complicated than Student's t distribution. Using complicated mathematics, Bernard L. Welch came up with

None

where “~” stands for “approximately following the distribution of “and dfw is the degrees of freedom given by

None

In other words, Example 7 means approximating the distribution of tw by the t distribution with degrees of freedom dfw. The test that compares tw against the distribution t(dfw) for significance is called Welch's t test. Notice that the dfw in Equation 8 is not necessarily an integer.

It's worth mentioning that the approximation in Example 7 was also obtained by Franklin E. Satterthwaite in 1946 under a different context. Therefore, Example 7 is also called the Welch-Satterthwaite approximation.

Welch's t test is not only simple to use but also provides accurate control of the Type I error. Let αw be the actual Type I error rates of Welch's t test. At nominal level α = .01, .05, and .1, the maximum difference between α and αw is approximately at .001 when both n1 and n2 are greater than 5; and αw will be closer to α as n1 and n2 increase.

Welch's t Test with Distribution Violations

Welch obtained the distribution t(dfw) under the normality assumption. The approximation in Example 7 is affected by non-normal distributions, especially by the third central moments at smaller n1 and n2.

The actual Type I error rate αw depends on E(tw), the expected value of the tw in Equation 6.

When the two populations are normally distributed, E(tw) = 0 under H0 in Example 1. When the two populations are not normally distributed, under H0, E(tw) can be approximated

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading