Skip to main content icon/video/no-internet

Respondent-Driven Sampling (RDS)

Respondent-driven sampling (RDS) is a method for drawing probability samples of “hidden,” or alternatively, hard-to-reach, populations. Populations such as these are difficult to sample using standard survey research methods for two reasons: First, they lack a sampling frame, that is, an exhaustive list of population members from which the sample can be drawn. Second, constructing a sampling frame is not feasible because one or more of the following are true: (a) The population is such a small part of the general population that locating them through a general population survey would be prohibitively costly; (b) because the population has social networks that are difficult for outsiders to penetrate, access to the population requires personal contacts; and (c) membership in the population is stigmatized, so gaining access requires establishing trust. Populations with these characteristics are important to many research areas, including arts and culture (e.g. jazz musicians and aging artists), public policy (e.g. immigrants and the homeless), and public health (e.g. drug users and commercial sex workers).

These populations have sometimes been studied using institutional or location-based sampling, but such studies are limited by the incomplete sampling frame; for example, in New York City only 22% of jazz musicians are musician union members and they are on average 10 years older, with nearly double the income, of nonmembers who are not on any public list.

This entry examines the sampling method that RDS employs, provides insights gained from the mathematical model on which it is based, and describes the types of analyses in which RDS can be used.

Sampling Method

RDS accesses members of hidden populations through their social networks, employing a variant of a snowball (i.e. chain-referral) sampling. As in all such samples, the study begins with a set of initial respondents who serve as “seeds.” These then recruit their acquaintances, friends, or relatives who qualify for inclusion in the study to form the first “wave.” The first wave respondents then recruit the second wave, who in turn recruit the third wave, and so forth. The sample expands in this manner, growing wave by wave, in the manner of a snowball increasing in size as it rolls down a hill.

RDS then combines snowball sampling—a non-probability sampling technique—with a mathematical model that weights the sample to compensate for the fact that it was not obtained in a simple random way. This procedure includes controls for four biases that are inherent in any snowball sample:

  • The seeds cannot be recruited randomly, because if that were possible, the population would not qualify as hidden in the first place. Generally, the seeds are respondents to whom researchers have easy access, a group that may not be representative of the full target population. Consequently, the seeds introduce an initial bias.
  • Respondents recruit their acquaintances, friends, and family members, whom they tend to resemble in income, education, race/ethnicity, religion, and other factors. This homophily principle was recognized by Francis Galton more than a century ago. Its implication is that by recruiting those whom they know, respondents do not recruit randomly. Instead, recruitments are shaped by the social network connecting the target population. Consequently, successive waves of recruitment introduce further bias into the sample.
  • Respondents who are well connected tend to be over-sampled, because more recruitment paths lead to them. Therefore, higher-status respondents—those who have larger social networks—are oversampled.
  • Population subgroups vary in how effectively they can recruit, so the sample reflects, disproportionately, the recruitment patterns of the most effective recruiters. For example, in AIDS prevention research, HIV positives generally recruit more effectively and also tend to recruit other positives, so positives tend to be oversampled.

Mathematical Model

RDS is based on a mathematical model of the network-recruitment process, which functions somewhat like a corrective lens, controlling for the distorting effects of network structure on the sampling process to produce an unbiased estimate of population characteristics. Space here does not permit presentation of the mathematical model on which RDS is based, but two insights upon which it is based provide a sense for how the model operates. First, modeling the recruitment process as a regular Markov chain reveals that if referral chains are sufficiently long, that is, if the chain-referral process consists of enough waves, the composition of the final sample becomes independent of the seeds from which it began. The point at which the sample composition becomes stable is termed the equilibrium. Therefore, an important design element in RDS involves measures for increasing the length of referral chains. Means for creating long chains include that respondents be recruited by their peers rather than by researchers, providing rewards for peer recruiters, and setting recruitment quotas so a few cannot do all the recruiting. Through these means, a major concern is resolved regarding bias in chain-referral samples, that is, producing a population estimate that is independent of the seeds (initial subjects) with which the sampling began.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading