Assessing Data Quality

  • Add to list Added to list Added
  • Cite
  • Share
  • Embed
  • Download PDFopens in new window

Overview

We live in a time of unprecedented availability of huge volumes of data that continue to grow at a rapid pace. Many government and university websites allow the public to download data sets in a variety of formats on an array of topics from unemployment to climate data to disease incidence rates. Even if you are not a data scientist, flashy dashboards abound displaying animated bar charts, line graphs, and maps of political polls, crime statistics, the most common web searches by state over time, and more. While the free and sometimes paid access to data is a boon for anyone needing information to inform critical decisions or to complete a school project, a new problem has been created. How do you know if the data you are using are accurate and mean what you think they represent?

Our current data cornucopia can be attributed in part to the rise of the Internet and the World Wide Web. Originally intended to connect researchers to share data, methods, and findings, the web is now populated by a broad spectrum of people from Nobel Prize-winning scientists to conspiracy theorists, from parents helping kids with homework to propagandists, scammers, and hackers. In short, the web is a reflection of humanity and its virtues and faults. Anyone can post a data set. Anyone can create a dashboard and populate it with convincing looking graphs. Anyone can also declare that another’s data are flawed or even fake without evidence and undermine confidence in that information. Troublingly, some companies use high quality data from social media, combined with sophisticated data science, to target users with misleading or false data. Further, some previously trusted producers of data and information can become compromised. While you need to always be vigilant, rest assured that there are still significant amounts of legitimate data available that can and should be used for business and professional purposes.

More than ever, the ability to assess data quality is a fundamental skill for the average person and not limited to the work of researchers and scientists. From voting decisions to financial planning and public health, we all benefit from the ability to determine what information can be trusted, to what extent, and for how long that information is valid. This Skill will introduce best practices for maintaining objectivity, how to interrogate data and address bias. Upon completion of this Skill, you should be able to:

  • Describe a variety of data lifespan characteristics
  • Identify data biases
  • Apply various data interrogation strategies

Further Reading

Huff, D. (1993). How to lie with statistics. W.W. Norton & Co.
Keller, D. K. (2015). The tao of statistics: A path to understanding (with no math) (
2nd ed.
). SAGE Publications.
Ellenberg, J. (2014). How not to be wrong: The power of mathematical thinking. Penguin.
Gonick, L., & Smith, W. (1993). The cartoon guide to statistics. HarperPerennial.