Skip to main content icon/video/no-internet

Anonymizing data is a process that occurs throughout the data collection and analysis phases of research where identifying information is removed from the data in order to protect the privacy of research participants, the groups and/or communities that are being examined. The process of anonymization helps to prepare the data for secondary use where it is made accessible to other researchers. Secondary use refers to using data to examine a question that was not the purpose of the original data collection.

Overview and Discussion

Data anonymization is an important stage in the research process, especially when preparing the data for secondary use. Anonymizing data may involve several levels and there are many ways it can be achieved.

The first level often involves removing or renaming direct identifiers. For example, anonymizing data involves more than simply removing the names of the participants under examination. It also involves removing or substituting all of the elements (e.g., names, places, and addresses) that might lead to the identification of an individual or group under examination. This can be done by giving all participants or cases a pseudonym or a code number. Some may argue that it is better to use numbers instead of pseudonyms in order to avoid the possibility of switching one person's name with that of another participant within the same case study. Numbers may offer protection against revealing participants' identity, but they do seem somewhat sterile. In using pseudonyms, it may also be important to give participants names that are appropriate to their generation. A good strategy might be to go to the most popular names from different years to find suitable names. If the researcher feels confident that it is not to the detriment of anonymity, it might be useful to choose a name that starts with the same letter—Jack to Joe for instance. This then might help when analyzing the data and keeping in mind the person rather than just his or her words.

Establishing pseudonyms early in the research process is helpful. Changing the names when first proofreading the transcripts provides space to become familiar with the pseudonym. The point is to keep the data alive and, at the very least, associated with that person but not identifiable. It is also important for the original researchers to keep a cross-referencing system, which is needed to link original names to the data.

The second level involves removing or renaming indirect identifiers. A common technique involves restricting the more extreme or deviant cases, particularly within qualitative data. Another approach is referred to as bracketing, where categories of a certain variable, like age and income, are combined. For example, birth dates should be converted into age categories. Other indirect but specific personal information that could identify participants could be anonymized in the same way. Another strategy is to collapse or to combine variables by creating a summary variable. Within qualitative research, anonymizing is often more difficult because many identifiers will often need to be removed. Depending on how small the case study is, certain places may also need to be renamed in order to protect the anonymity of the community under investigation.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading