Generalizability Theory

Neil J.Salkind

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Generalizability Theory

Edited by:
Neil J. Salkind
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n165
Subject:Research Design
Keywords:generalizability theory; testing; theories

Request Permissions

Show page numbers Hide page numbers

Generalizability theory (G theory), originally developed by Lee J. Cronbach and his associates, is a measurement theory that provides both a conceptual framework and a set of statistical procedures for a comprehensive analysis of test reliability. Building on and extending classical test theory (CTT) and analysis of variance (ANOVA), G theory provides a flexible approach to modeling measurement error for different measurement conditions and types of decisions made based on test results. This entry introduces the reader to the basics of G theory, starting with the advantages of G theory, followed by key concepts and terms and some illustrative examples representing different G-theory analysis designs.

Advantages

There are a few approaches to the investigation of test reliability, that is, the consistency of measurement obtained in testing. For example, for norm-referenced testing (NRT), CTT reliability indexes show the extent to which candidates are rank-ordered consistently across test tasks, test forms, occasions, and so on (e.g., Cronbach's alpha and parallel-form and test-retest reliability estimates). In contrast, in criterion-referenced testing (CRT), various statistics are used to examine the extent to which candidates are consistently classified into different categories (score or ability levels) across test forms, occasions, test tasks, and so on. Threshold-loss agreement indexes such as the agreement coefficient and the kappa coefficient are some examples.

Why might one turn to G theory despite the availability of these different approaches to reliability investigation? G theory is a broadly defined analytic framework that addresses some limitations of the traditional approaches. First, the approaches above address only NRT or CRT, whereas G theory accommodates both (called relative decisions and absolute decisions, respectively), yielding measurement error and reliability estimates tailored to the specific type of decision making under consideration. Second, CTT reliability estimates take account of only one source of measurement error at a time. Thus, for example, when one is concerned about the consistency of examinee rank-ordering across two testing occasions and across different raters, he or she needs to calculate two separate CTT reliability indexes (i.e., test-retest and interrater reliability estimates). In contrast, G theory provides reliability estimates accounting for both sources of error simultaneously. The G theory capability to analyze multiple sources of error within a single analysis is particularly useful for optimizing the measurement design to achieve an acceptable level of measurement reliability.

Key Concepts and Terms

A fundamental concept in G theory is dependability. Dependability is defined as the extent to which the generalization one makes about a given candidate's universe score based on an observed test [Page 534]score is accurate. The universe score is a G-theory analogue of the true score in CTT and is defined as the average score a candidate would have obtained across an infinite number of testing under measurement conditions that the investigator is willing to accept as exchangeable with one another (called randomly parallel measures). Suppose, for example, that an investigator has a large number of vocabulary test items. The investigator might feel comfortable treating these items as randomly parallel measures because trained item writers have carefully developed these items to target a specific content domain, following test specifications. The employment of randomly parallel measures is a key assumption of G theory. Note the difference of this assumption from the CTT assumption, where sets of scores that are involved in a reliability calculation must be statistically parallel measures (i.e., two sets of scores must share the same mean, the same standard deviation, and the same correlation to a third measure).

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Generalizability Theory

Advantages

Key Concepts and Terms

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends