Skip to main content icon/video/no-internet

Researchers in epidemiology and public health commonly make a distinction between primary data, data collected by the researcher for the specific analysis in question, and secondary data, data collected by someone else for some other purpose. Of course, many cases fall between these two examples, but it may be useful to conceptualize primary and secondary data by considering two extreme cases. In the first case, which is an example of primary data, a research team collects new data and performs its own analyses of the data so that the people involved in analyzing the data have some involvement in, or at least familiarity with, the research design and data collection process. In the second case, which is an example of secondary data, a researcher obtains and analyzes data from the Behavioral Risk Factor Surveillance System (BRFSS), a large, publicly available data set collected annually in the United States. In the second case, the analyst did not participate in either the research design or the data collection process, and his or her knowledge of those processes come only from the information available on the BRFSS Web site and from queries to BRFSS staff.

Secondary data are used frequently in epidemiology and public health, because those fields focus on monitoring health at the level of the community or nation rather than at the level of the individual as is typical in medical research. In many cases, using secondary data is the only practical way to address a question. For instance, few if any individual researchers have the means to collect the data on the scale required to estimate the prevalence of multiple health risks in each of the 50 states of the United States. However, data addressing those questions have been collected annually since the 1980s by the Centers for Disease Control, in conjunction with state health departments, and it is available for download from the Internet. Federal and state agencies commonly use secondary data to evaluate public health needs and plan campaigns and interventions, and it is also widely used in classroom instruction and scholarly research.

There are both advantages and disadvantages to using secondary data. The advantages relate primarily to the fact that an individual analyst does not have to collect the data himself or herself and can obtain access through a secondary data set information much more wide-ranging than he or she could collect alone. Specific advantages of using secondary data include the following:

  • Economy, because the analyst does not have to pay the cost of data collection
  • Speed and convenience, because the data are already available before the analyst begins to work
  • Availability of data from large geographic regions, for instance, data collected on the national or international level
  • Availability of historical data and comparable data collected over multiple years, for instance, the BRFSS data are available dating back to the 1980s, and certain topics have been included every year
  • Potentially higher quality of data, for instance, the large surveys conducted by federal agencies such as the Centers for Disease Control commonly use standardized sampling procedures and professional interviewers in contrast to many locally collected data sets that represent a convenience sample collected by research assistants.

The disadvantages of using secondary data relate primarily to the potential disconnect between the analyst's interests and the purposes for which the data were originally collected and the analyst's lack of familiarity with the original research design and data collection and cleaning processes. These disadvantages include the

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading