Skip to main content icon/video/no-internet

Variance refers to the degree of variability (dispersion) among a collection of observations. Although estimation of the size of the variance in a distribution of numbers often is a complex process, it is an extremely important endeavor for survey researchers, as it helps make valid inferences of population parameters.

Standard variance estimation formulas for simple random sampling, stratified sampling, and cluster sampling can be found in essentially any survey sampling textbook, such as those by William G. Cochran or Leslie Kish. However, most large survey samples use combinations of unequal probabilities, stratification, and clustering. Often, survey weights are used in estimation that account for unequal probabilities, nonresponse, and post-stratification. These are usually ratio estimates with the numerator being the weighted average and the denominator being the sum of the weights. Both the numerator and denominator are random variables. However, textbook variance formulas are not sufficient for these survey samples and estimation problems. Specialized variance estimation software packages have been required until only recently, but now general-purpose statistical analysis programs have started to include the special variance estimation techniques needed to correctly calculate variances for complex sample surveys.

Without correct variance estimates, users are unable to make valid inferences concerning population parameters. Most complex sample surveys have larger standard errors than do simple random samples. If inference is done using standard errors from simple random samples, the standard errors would be too small and any statistical procedures would be too liberal (e.g., p-values or confidence intervals would be too small, and test statistics would be too large).

Calculation Methods

The two most popular classes in the calculation of correct variance estimates for complex sample surveys are replicate methods for variance estimation and Taylor series linearization methods. A third class of techniques that are sometimes used are generalized variance functions.

Replicate methods for variance compute multiple estimates in a systematic way and use the variability in these estimates to estimate the variance of the full-sample estimator. The simplest replicate method for variance is the method of random groups. The method of random groups was originally designed for interpenetrating samples or samples that are multiple repetitions (e.g., 10) of the same sampling strategy. Each of these repetitions is a random group, from which an estimate is derived. The overall estimator is the average of these estimates, and the estimated variance of the estimator is the sampling variance of the estimators. Of course, this technique can be used for any complex sample survey by separating the sample into sub-samples that are as equivalent as possible (e.g., by sorting by design variables and using systematic sampling to divide into random groups). This simplest of replicate methods for variance is simple enough to do by hand, but is not as robust as more modern replication-based methods that can now be easily calculated by computers.

Balanced repeated replication (BRR), also known as balanced half-samples, was originally conceived for use when two primary sampling units (PSUs) are selected from each stratum. A half-sample then consists of all cases from exactly one primary sampling unit from each stratum (with each weight doubled). Balanced half-sampling uses an orthogonal set of half-samples as specified by a Hadamard matrix. The variability of the half-sample estimates is taken as an estimate of the variance of the full-sample estimator.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading