Skip to main content icon/video/no-internet

Equal Probability of Selection

Survey samples can be chosen in many ways, and one common approach is to use a technique that provides an equal chance of selection to all elements in the sampling frame. One type of equal probability sample is a simple random sample, but there are many others.

Morris H. Hansen, William N. Hurwitz, and William G. Madow appear to have been the first to refer to them as EPSEM samples (“equal probability selection method”), but the term was so often used by Leslie Kish that some have misattributed the coinage to him. Others have used the phrase self-weighting sample, although some eschew this term, given that weighting typically involves nonresponse adjustment and some form of calibration such as ratio adjustment or raking, and these lead to unequal weights even when all elements of the sample have been selected with equal probability. Typically, the equal in the title refers only to marginal inclusion probabilities. Joint probabilities of selection vary across pairs of units for designs other than simple random samples.

The variation across pairs of units is caused most often by systematic selection, stratification, clustering, or some combination of these, although it can also be caused by other sampling systems, such as controlled selection and maximization (or minimization) of overlap with other samples. The purpose of varying the joint probabilities of selection is to improve efficiency by exploiting auxiliary information. The reasons to keep the marginal inclusion probabilities constant are less compelling and largely involve tradition.

One of the innovations that was introduced in the 1940s at the U.S. Census Bureau is a scheme for multi-stage sampling that preserves equal probabilities and is very efficient. In this design, clusters are stratified into strata that, in addition to being internally homogenous, are nearly equal in population. Two clusters are then selected with probability proportional to population from each stratum. Within sample clusters, second-stage probabilities of selection are calculated so as to achieve an EPSEM sample. Given reasonably accurate population measures, this procedure will result in nearly equal-sized cluster workloads, convenient for a local interviewer to handle. Attendant reductions in the variation in cluster sample size and in sampling weights also improve efficiency.

Also, in the 1940s, it was much harder to deal with unequal weights at the analysis phase. Now, with software like SUDAAN, WesVar, and various SAS procedures that are readily available and designed to cope with unequal weights, there is less reason to design EPSEM samples. There are, however, still some reasons to consider them. Some are articulated by advocates of inverse sampling, a procedure whereby an EPSEM sample is extracted from a larger sample. Certainly, if one is interested in multi-level modeling, then an EPSEM sample can still be advantageous because there is considerable debate about how to use sampling weights in fitting such models. Another advantage arises in the context of hot-deck item imputation. If probabilities of selection are equal, then the contentious question of whether to use the weights in donor selection is avoided.

Despite these analytic and workload advantages, samplers should feel free to vary probabilities of selection using optimal allocation when advance knowledge of strata characteristics is available. This is particularly important for oversampling of minority populations in the United States.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading