Skip to main content icon/video/no-internet

Duplication refers to the prevalence of an element more than one time on a sampling frame, assuming that the element appears only once in the target population but appears more than once in the sampling frame. As straightforward as this problem and its solution may appear to be, its detection and correction can be complicated, time-consuming, and/or costly.

For example, a sampling frame made up of names of members of a professional organization may list the same person more than once if the professional organization has not cleaned its list well, so that all but one of the variants of the same name are purged—as in trying to narrow down the following names to only one listing: “Joan F. Smithers,” “Joan Smathers,” “J. F. Smithers,” “J. Smythers,” and so on. Whether or not all the names in this example are the same person is not certain, but it serves to demonstrate the challenges the issue of duplication raises.

Other times, when there is no real list serving as a sampling frame, such as in random-digit dialing (RDD) telephone sampling, the concept of duplication is somewhat more abstract, since the initial sampling unit in such a survey is a household, and many households can be reached by more than one telephone number. Thus, an RDD frame contains a lot of duplication as it relates to the existence of telephone numbers that reach particular households or businesses. In telephone surveying, this is further complicated by the growth of cell phone ownership, which leads to even more telephone numbers that can reach members of the same household.

The major problem that duplication creates is that it leads to unequal probabilities of selection. Probability samples require that elements have a known, but not necessarily an equal, probability of selection. Thus researchers who want to maintain their probability samples must gather information regarding how many “chances” a selected respondent has to be sampled. With a sampling frame that can be cleaned of duplication, it is incumbent upon the researchers to do this as well as possible before the sample is drawn. Then all elements have similar chances of being selected assuming a simple random sample is drawn. But with other sampling frames, in particular with RDD telephone frames, measures must be taken upon reaching a household or business to determine how many other telephone numbers that exist in the frame could also have reached the household or business. This information can then be used to adjust (weight) the database prior to conducting analyses in order to “correct” the issue of duplication and reduce the potential bias it may create.

Paul J.Lavrakas

Further Readings

Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken, NJ: Wiley.
MerkleD. M., and LangerG.How too little can give you a little too much: Determining the number of household phone lines in RDD surveys. Public Opinion Quarterly72 (2008) (1) 114–124. http://dx.doi.org/10.1093/poq/nfn004
  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading