Skip to main content icon/video/no-internet

Everyone who works in any field of science is well acquainted with the notion that whatever measure we take of whatever phenomenon, that measure is inherently affected by random error. Indeed, reliability issues are recognized as of capital importance in any scientific endeavour, as well as in psychology. Over the years, the classical test theory of measurement has been the solid ground for almost all of psychological testing. The aim of this entry is to describe generalizability theory (Brennan, 2001; Cronbach, Gleser, Nanda & Rajaratnam, 1972), which represents a more precise and complete model of the composition of an observed measure, and to show some of its advantages relative to classical test theory.

According to classical test theory, an observed score is composed of the sum of two components: the unknown true score and the random error. The central point of classical test theory is that error is randomly and independently distributed, and is uncorrelated with true score, as well as with true scores and errors on subsequent measurements.

The classical test theory only takes a unitary error term into account, even though errors actually come from multiple sources. This means that reliability assessment must rest on multiple procedures and indicators (for example, test-retest, split-half, Cronbach's alpha), each one accounting for a different error source. Thus, a high test-retest reliability means that we can trust that measure independently of the occasion when it is measured, but it tells us nothing about whether we can trust that measure independently of the system (human being or instrument) which actually makes the measurement. Consequently, multiple reliabilities exist within classical test theory, for instance across occasions, across raters, across items, and so forth. This represents a major limit of the classical approach to reliability, as it cannot account for multiple error sources. Far more importantly, classical theory of reliability cannot account for the interaction among different sources of error. For instance, neither Cronbach's alpha nor test-retest reliabilities are useful when consistency across items changes across occasions.

Generalizability theory represents a more general approach to the assessment of the reliability of a score. It defines a score as a sample from the universe of all the admissible observations, characterized by one or more conditions of measurement. Here, the true score is defined as the universe score, that is the average of all the observations in the universe of admissible observations, and errors are defined by the conditions of measurement. Items, raters, occasions, tests, and so forth, are examples of the conditions of measurement, and each one accounts for part of the variability of the observed scores. Generalizability theory is designed to estimate the multiple components of the obtained score variability, and to use them to explore the effects of different sources of measurement error. Consequently, it allows the investigation of several sources of variation simultaneously, and the estimation of the error in generalizing an observed result to the universe defined by each of them. Generalizability theory was developed in the context of dependability of behavioural measurements. Nevertheless, the model is rather general and may as well apply to other reliability issues.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading