Skip to main content icon/video/no-internet

Probability of Selection

In survey sampling, the term probability of selection refers to the chance (i.e. the probability from 0 to 1) that a member (element) of a population can be chosen for a given survey. When a researcher is using a probability sample, the term also means that every member of the sampling frame that is used to represent the population has a known nonzero chance of being selected. That chance can be calculated as a member's probability of being chosen out of all the members in the population. For example, a chance of 1 out of 1,000 is a probability of 0.001 (1/1,000 − 0.001). Since every member in a probability sample has some chance of being selected, the calculated probability is always greater than zero. Because every member has a known chance of being selected, it is possible to compute representative unbiased estimates of whatever a researcher is measuring with the sample. Researchers are able to assume with some degree of confidence that whatever they are estimating represents that same parameter in the larger population from which they drew the sample. For nonprobability samples (such as quota samples, intercept samples, snowball samples, or convenience samples), it is not feasible to confidently assess the reliability of survey estimates, since the selection probability of the sample members is unknown.

In order to select a sample, researchers generally start with a list of elements, such as addresses or telephone numbers. This defined list is called the “sampling frame.” It is created in advance as a means to select the sample to be used in the survey. The goal in building the sampling frame is to have it be as inclusive as possible of the larger (target) population that it covers. As a practical reality, sample frames can suffer from some degree of undercoverage and may be plagued with duplication. Undercoverage leads to possible coverage error, whereas duplication leads to unequal probabilities of selection because some elements have more than one chance of being selected. Minimizing and even eliminating duplication may be possible, but undercoverage may not be a solvable problem, in part because of the cost of the potential solution(s).

In designing a method for sampling, the selection probability does not necessarily have to be the same (i.e. equal) for each element of the sample as it would be in a simple random sample. Some survey designs purposely oversample members from certain subclasses of the population to have enough cases to compute more reliable estimates for those subclasses. In this case, the subclass members have higher selection probabilities by design; however, what is necessary in a probability sample is that the selection probability is knowable.

Depending on the method of data collection, the final selection probability may not be known at the outset of data collection. For example, in household surveys, such as those selected via random-digit dialing (RDD), additional information such as the number of eligible household members needs to be collected at the time of contact in order to accurately compute the final selection probability. The more eligible members in the household, the lower is the selection probability of any one member; for example, in a household with a wife, husband, and two adult children, each has a probability of selection within their household of 1/4. Furthermore, in RDD landline telephone surveys of the general public, it is common to ask how many working telephone numbers are associated with a household. If there are two working landline telephone numbers, then the household has twice the chances of being selected compared to households with only one working landline number, and thus a weighting adjustment can be made for households with two or more numbers. Similarly, in mail survey questionnaires that are not sampling specifically named people, a question about household size regarding eligible members is generally asked. In a systematic sample (e.g. exit polls), the probability of selection is the inverse of the sampling interval.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading