Skip to main content icon/video/no-internet

Weights are numerical values that are used in surveys to multiply by response values in order to account for missing observations. The missing data may be absent as the result of a prearranged sample design or as the result of nonresponse. In the case of sample designs, weights are used to estimate totals or means for data of interest, such as acres of corn grown or household income, based on a selected subset of the entire population. The population could be, say, all farms or all households. The subset is known as a sample. In the case of nonresponse, the weights are inflated further to account for those missing observations. Another method for accounting for nonresponse is to replace those values with data derived from other information. That process is known as imputation. Whether compensation for nonresponse is done by imputation, by weighting the results of a census, or by adjusting the weights for a sample, one must consider whether the missing data are different in some way from the data actually observed. Here, however, we will examine the more straightforward use of weights in surveys: sampling weights.

Data collected on a given characteristic or characteristics for a population, such as acres of corn planted on farms in Minnesota, constitute a survey. Thus a characteristic for a data element could be number of acres of corn, and a population could be all farms in Minnesota. Often we want to estimate totals or means of data elements, such as total acres of corn in a population (here, Minnesota farms), by taking a sample of members of the population and from that data, inferring an estimate for the population. The sample is selected from the population according to varying rules, depending on the type of sample. Samples may be model based or design based. Often they are design based with modelassisted inference. If they are design based, then samples are collected on the basis of the randomization principle. This practice leads to the sample weights we discuss here.

For example, the simplest design-based sample is the simple random sample (SRS). If, for example, a sample of 20 were to be drawn at random from a population with 100 members, with an equal chance of selection for each member of the population, then that sample would be an SRS. Here the sample size is n = 20, and the population size is N = 100. The probability of selection for each member of the population to become a member of the sample is 20/100 = 1/5 In general, the probability of selection for an SRS is n/N Each value we collect for a characteristic, here acres of corn, for each member of the sample can be added together for a sample total. The sample weight that we need to multiply by this sample total to obtain an estimated population total would be the inverse of the probability of selection, or w = n/N where w is the weight. In the above example, w = 5.

These weights are always the inverses of the corresponding probabilities of selection. The probability of selection depends on the structure of the designbased sample and can be complex. This process may involve various stages and adjustments, but the basic fact to remember is that a sample weight for any given observation is the inverse of its probability of selection.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading