Skip to main content icon/video/no-internet

Scientists from numerous disciplines frequently make sense of the world by using yardsticks that they hope will show how their study participants are performing, what they are thinking, and how they interact with others. Numbers are faithfully recorded, spun through various forms of software, and prepared for publication. All this is fine if the yardsticks themselves are true—all the time, in every single place they are used, regardless of who is doing the actual recording of the numbers, and regardless of the circumstances in which the numbers are obtained. But what if the yardsticks themselves are shaky?

In epidemiologic analyses based strictly on counting, a few units in dispute here or there may seem rather unlikely to significantly change the overall interpretation of the data set. However, even a single reclassification of a case from one cell to another, for instance, can force a confidence interval to bracket 1.0 where it otherwise might not do so, or a statistical test to just miss threshold. Although it might not appear as an issue when the data are highly differentiable, the need for quality measurement is fundamentally inescapable. Equally important, whole sections of the field of epidemiology have long ago been unbound from the simple exercise of counting, working instead in arenas in which measurements take the form of scores, scales, and other assessments. In such settings, the challenges to designing and conducting strong and reproducible studies are magnified.

The domain of psychometrics gives criteria concerning the quality of a measurement. Although “psychometrics” has been a label narrowly applied to a particular specialized branch of mathematical and statistical thinking within educational research, this entry uses the term in a broader sense. It explores a handful of concepts that are crucial to all measurements and considers recent examples in the epidemiologic literature that show the importance of such considerations.

Reliability

Psychometricians have been preaching for decades that the core considerations of good measurement must not be simply assumed whenever a set of assessments is made. Principal among these considerations is that the measurements be reliable and valid. Reliability is defined as the consistency of measurements made by a specific group of persons using the measurement instrument, working under the same conditions. In the most elementary sense, high reliability means that data will be consistent if the identical study is run again. Even in closely monitored laboratory conditions, however, there are numerous possible contaminants that can interfere with obtaining reliable data. To reduce error and improve reliability, laboratories make constant use of standardizing and correcting baseline values for all their measurement devices. Likewise, measurements made in the field need comparable standardizations: One often-used method for standardization is to be sure that different field workers show high levels of agreement when facing the same situations for data collection. A simple but informative analysis is to evaluate overall agreement between workers using varying tolerances: A tolerance of zero (equivalent to exact agreement) results in a overall percentage between 0 and 100, then (if not 100%) tolerances are widened step by step (i.e, liberalizing the definition of agreement) until 100% agreement is achieved. (Software to accomplish this task is available in the R package.∗) The climb toward full agreement as tolerances are made less restrictive is a direct reflection of the reliability of the sources of data.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading