Duplication

Paul J.Lavrakas

doi:10.4135/9781412963947

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Duplication

Edited by:
Paul J. Lavrakas
In:Encyclopedia of Survey Research Methods
Chapter DOI:https://doi.org/10.4135/9781412963947.n149
Subject:Survey Research

Request Permissions

Show page numbers Hide page numbers

Duplication refers to the prevalence of an element more than one time on a sampling frame, assuming that the element appears only once in the target population but appears more than once in the sampling frame. As straightforward as this problem and its solution may appear to be, its detection and correction can be complicated, time-consuming, and/or costly.

For example, a sampling frame made up of names of members of a professional organization may list the same person more than once if the professional organization has not cleaned its list well, so that all but one of the variants of the same name are purged—as in trying to narrow down the following names to only one listing: “Joan F. Smithers,” “Joan Smathers,” “J. F. Smithers,” “J. Smythers,” and so on. Whether or not all the names in this example are the same person is not certain, but it serves to demonstrate the challenges the issue of duplication raises.

Other times, when there is no real list serving as a sampling frame, such as in random-digit dialing (RDD) telephone sampling, the concept of duplication is somewhat more abstract, since the initial sampling unit in such a survey is a household, and many households can be reached by more than one telephone number. Thus, an RDD frame contains a lot of duplication as it relates to the existence of telephone numbers that reach particular households or businesses. In telephone surveying, this is further complicated by the growth of cell phone ownership, which leads to even more telephone numbers that can reach members of the same household.

The major problem that duplication creates is that it leads to unequal probabilities of selection. Probability samples require that elements have a known, but not necessarily an equal, probability of selection. Thus researchers who want to maintain their probability samples must gather information regarding how many “chances” a selected respondent has to be sampled. With a sampling frame that can be cleaned of duplication, it is incumbent upon the researchers to do this as well as possible before the sample is drawn. Then all elements have similar chances of being selected assuming a simple random sample is drawn. But with other sampling frames, in particular with RDD telephone frames, measures must be taken upon reaching a household or business to determine how many other telephone numbers that exist in the frame could also have reached the household or business. This information can then be used to adjust (weight) the database prior to conducting [Page 216]analyses in order to “correct” the issue of duplication and reduce the potential bias it may create.

Paul J.Lavrakas

http://dx.doi.org/10.4135/9781412963947.n149

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

Entry

Reader's guide

Entries A-Z

Subject index

Duplication

Further Readings

Sign in to access this content

Get a 30 day FREE TRIAL

Sage Recommends

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Duplication

Further Readings

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends