Skip to main content icon/video/no-internet

Intercoder Reliability Standards: Stability

Reliability involves the extent to which an experiment, test, or any systematic procedure yields the same results across replicated trials under identical conditions. Mathematically, reliability is defined as the ratio of the variation between a true score and an observed score (notably, this definition is applicable to measurement variables essentially and does not apply to nonmeasurement variables). In many cases a true score cannot be directly observed and measured. In order to obtain a true score for many variables important to social scientists, one would have to remeasure the variable infinitely. While no single measurement can determine the true score exactly, the average of an infinite number of repeated measurements would yield the true score. Measurement, by definition, is not error free and consequently any two or more measurements of the same “true” variable will never entirely duplicate each other; however, they can have high consistency. It is also important to note that reliability does not assume validity and measures are sample dependent. This entry examines stability, which is one attribute of reliability. It further examines conceptual issues related to intercoder reliability and common measures of intercoder reliability and stability coefficients.

Stability

Stability is one attribute of reliability that speaks to the tendency toward consistency among the same phenomenon across repeated measurements. It is its stability over time. For example, if an experiment is reliable, it yields consistent results across repeated measures and unreliable if repeated measures yield different results. Additionally, more consistent results from repeated measurements suggest greater reliability of the measuring procedure. Among attitudinal surveys and medical diagnoses, stability is often employed in reference to subject feedback and outcomes across time for the individual subjects. Stability speaks to the correlation of measurement results from different points in time where the subjects being measured and the measuring instrument remain the same. Higher stability suggests both measurement reliability and response continuity. Lower stability suggests unreliability. Most discussion of reliability stability focuses on the stability of measurement instruments rather than the stability of reliability between coders over time. Although rarely reported, intercoder reliability stability is important for assessing the meaning and generalizability of studies that employ coders.

Conceptual Issues Surrounding Intercoder Reliability

Coding

When relying on observation, reliability in coding means that the biases and idiosyncrasies inherent in the observers are substantially less than the “true variation” of the behavior being coded. While a common definition for reliability is the consistency in measures over time, for intercoder reliability, it is further layered to include consistency in the observations between two or more coders—their intercoder reproducibility—in addition to their consistency in coding over time—their intercoder stability—that informs accuracy and data quality.

Such a definition carries the presumption that there is an outcome or behavior that exists independent from the observer. While this assumption is generally accepted in research traditions, such as the experimental worldview, it is not universally accepted in all research traditions. For example, postmodern and standpoint theorists might not accept this assumption and could point to many instances in which characteristics of the observer greatly affected what was observed. As an example, in her research on primates, Donna Haraway details how the very same primate group was observed by an American research group and a Japanese research group. The nature of the observations was quite different, with the Japanese observing much more communal activity than the Americans. Another example, from adolescent health, is the notion of diagnosis. Diagnosing a patient from time 1, to time 2, and so on over time would suggest greater stability in diagnosis and contribute to greater confidence in overall reliability. However, having a common diagnosis from time 1 to time 2, and from multiple researchers/doctors would further enhance overall reliability as it would suggest stability, intercoder stability, and intercoder reliability.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading