Skip to main content icon/video/no-internet

Optimal Allocation

Optimal allocation is a procedure for dividing the sample among the strata in a stratified sample survey. The allocation procedure is called “optimal” because in a particular survey sampling design (stratified simple random sampling) it produces the smallest variance for estimating a population mean and total (using the standard stratified estimator) given a fixed budget or sample size.

A sample survey collects data from a population in order to estimate population characteristics. A stratified sample selects separate samples from subgroups (called “strata”) of the population and can often increase the accuracy of survey results. In order to implement stratified sampling, it is necessary to be able to divide the population at least implicitly into strata before sampling. Given a budget that allows gathering data on n subjects or a budget amount $B, there is a need to decide how to allocate the resources for data gathering to the strata. Three factors typically affect the distribution of resources to the strata: (1) the population size, (2) the variability of values, and (3) the data collection per unit cost in the strata. One also can have special interest in characteristics of some particular strata that could affect allocations.

In a stratified simple random sample, a sample of size nh is selected from strata or subpopulation h, which has a population size of Nh (h = 1,2,…, H). The standard estimator of the population total is None where Noneh is the mean (arithmetic average) of the sample values in stratum h and Σ denotes summation across strata h = 1,2,…, H. The variance of the estimator is None where None is the variance of the values in stratum h. If the rate of sampling is small in all strata, then (ignoring the finite population correction terms None the variance is approximately None Suppose the cost to collect data from one element (person, unit, etc.) in stratum h is Ch. If there is a budget of B, then the entire budget is spent when None Then the variance (ignoring the finite population correction terms) of the estimated population total is minimized when the sample size in stratum h is None where the summation in the denominator is over all strata, Sh is the standard deviation (square root of the variance) of the values in stratum h, and n is the total sample size. This formula implies that one should sample more in large subpopulations (strata), more in strata with large variances, and more in strata with small cost. If costs of per unit data collection are the same in all strata, then the optimal allocation in stratum h is None If in addition variances (and standard deviations) are constant, then None which is the allocation known as proportional allocation to strata. If the nh's are not integers, then one must round the numbers to integers for sample selection. Rounding does not necessarily move all values to the closest integer for all strata, because the total sample size n needs to be allocated.

Suppose one wanted to collect data on students at a large public university. Questions of interest could be hours worked per week; amount of money expended per semester on textbooks; amount of time spent eating at restaurants in a week; number of trips to the airport in a semester; and whether or not friends smoke cigarettes. The students selected for the survey could be contacted via their university email addresses and asked to complete an online Web survey. A survey can be preferable to contacting every student, because for a sample better efforts can often made to encourage response and check data quality. Administrative records contain college year designations (first, second, third, fourth) for each student in the target population; college years can be used as strata. Suppose the total sample size is allowed to be 1,600 students. Equal allocation to strata would sample 400 students from each year. Table 1 presents allocations of students to the four strata based on total enrollments by college year; these numbers are similar to 2006 enrollment at Iowa State University. The hypothetical variable being considered is hours worked per week. It is assumed that students in higher years have more variable employment situations than students in earlier years, hence the increasing standard deviation. It also is assumed that more attempts are needed to contact students in later years than in earlier years. As can be seen in the table, the stratum of fourth-year students receives the largest sample (n4 = 731), whereas the stratum of first-year students receives the smallest (n1 = 224).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading