Skip to main content icon/video/no-internet

Design-based estimation methods use the sampling distribution that results when the values for the finite population units are considered to be fixed, and the variation of the estimates arises from the fact that statistics are based on a random sample drawn from the population rather than a census of the entire population.

Survey data are collected to estimate population quantities, such as totals, means, or ratios of certain characteristics. Other uses include comparing sub-populations—for example, estimating the average difference between males and females for certain characteristics. In addition to these descriptive quantities, for many surveys the data are used to fit statistical models, such as linear regression models, to explain relationships among variables of interest for the particular population. In any case, statistics derived from the sample are used to estimate these population quantities, or parameters. The basis for assessing the statistical properties of such estimates is the sampling distribution (the probability distribution) of the estimates—the distribution of the estimates that would arise under hypothetical repetitions using the same randomization assumptions and the same form of the estimate.

In design-based estimation, the probabilities used to select the sample are then used as the basis for statistical inference, and such inference refers back to the finite population from which the random sample was selected. These selection probabilities are derived using the particular survey sampling design (e.g. multi-stage, clustered, stratified). In design-based estimation methods, sampling weights are used to account for the possibly unequal probabilities of selection used to draw the sample.

Survey practitioners can also make use of alternative estimation methods including model-based approaches. Pure model-based estimation methods assume that the values for the finite population, Y1, Y2, …, YN, are the realization of a random variable from a statistical model, and that the observed outcomes, y1, y2, …,yn, can be thought of as having been generated from either that same statistical model or from a statistical model that has been modified to take into account how the sample design has affected the sampling distribution for the sample data. The observations from the sample are used to predict the unobserved units in the population. In contrast, in design-based estimation methods, the values for the finite population units, Y1, Y2, …, YN, are treated as fixed but unknown quantities, and the sampling distribution for the observed outcomes, y1, y2, …, yN, arises from the probabilities used to select the units for inclusion into the sample.

Another framework can be used that combines the model and design-based estimation methods and is referred to as a “model-design-based framework” or a “combined distribution.” Within this framework, the values for the finite population, Y1, Y2, …, YN, are considered to be the realization of a random variable from a statistical model, and the probability distribution for the outcomes, y1, y2, …, yN, is determined by both the statistical model for the population values and the probabilities used to select the units in the sample. Under the model-design-based framework, fitting statistical models to data obtained through a complex survey design, using design-based estimation methods, will often give protection against violation of the model assumptions and any misspeci-fication that may be made with respect to the sampling distribution of the observed data, especially for large sample sizes and small sampling fractions.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading