Skip to main content icon/video/no-internet

Secondary Data Source

The analysis of secondary data plays a vital role in many fields of study, including the social sciences. The definition of secondary versus primary data is not based on specific qualities of the data itself but on its history and relationship to a specific analysis. A simple definition is that primary data are collected by a research group for the specific analysis in question, whereas secondary data are collected by someone else for some other purpose. So if a researcher conducts a survey and analyzes the results for his or her analysis, the data from the survey are primary data. If the researcher deposits the data in an archive and someone else unrelated to the original research team analyzes it 20 years later, then the results for that analysis the data are now secondary data.

One reason analysis of secondary data is becoming more popular in the social sciences is the availability of large data sets collected and processed by the government and made available for researchers to analyze. Examples of such data sets in the United States include those from the decennial U.S. Census, which aims to collect information from every person living in the United States in the year the census is conducted, and the annual Behavior Risk Factor Surveillance Survey System (BRFSS), which collects data on health behaviors from a representative sample of Americans and is weighted to reflect the entire U.S. population. It would be beyond the capability of most if not all research teams to collect data on this scale, but the data from these projects are available for anyone with a connection to the Internet to download for free.

The distinction between primary and secondary data should not be overemphasized. Many researchers work with both primary and secondary data during the course of their careers, depending on the specific research questions they are studying at the time, and often both primary and secondary data are analyzed within one research project. The same statistical techniques might be used on either primary or secondary data, and both types of data have advantages and disadvantages. The goal of the researcher, therefore, should be to select appropriate data for a specific research question.

The primary advantage of using secondary data is the fact that the data are already collected and processed, which represents a substantial savings of time and money and allows researchers to focus their efforts on framing questions and conducting analyses. Another major advantage is the scope of secondary data available: Few researchers would be able to conduct even one survey comparable with the BRFSS, for instance, which has been conducted annually since 1984. A third advantage is that the quality of secondary data is often very high: For instance, federal agencies have a large staff that trained to plan large-scale surveys, write data collection instruments, conduct surveys, and clean the data. Such projects often use scientific sampling plans which allow for data to be weighted to represent larger populations, such as the entire U.S. population. The methodologies applied to the data are often well documented as well: For instance, the Centers for Disease Control and Prevention, which is a branch of the U.S. National Institutes of Health, issues many technical reports (which are freely available on the Internet) describing their data sets and the methods by which the data were collected and processed.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading