Skip to main content icon/video/no-internet

Classical Test Theory

Measurement is the area of quantitative social science that is concerned with ascribing numbers to individuals in a meaningful way. Measurement is distinct from statistics, though measurement theories are grounded in applications of statistics. Within measurement there are several theories that allow us to talk about the quality of measurements taken. Classical test theory (CTT) can arguably be described as the first formalized theory of measurement and is still the most commonly used method of describing the characteristics of assessments. With test theories, the term test or assessment is applied widely. Surveys, achievement tests, intelligence tests, psychological assessments, writing samples graded with rubrics, and innumerable other situations in which numbers are assigned to individuals can all be considered tests. The terms test, assessment, instrument, and measure are used interchangeably in this discussion of CTT. After a brief discussion of the early history of CTT, this entry provides a formal definition and discusses CTT's role in reliability and validity.

Early History

Most of the central concepts and techniques associated with CTT (though it was not called that at the time) were presented in papers by Charles Spearman in the early 1900s. One of the first texts to codify the emerging discipline of measurement and CTT was Harold Gulliksen's Theory of Mental Tests. Much of what Gulliksen presented in that text is used unchanged today. Other theories of measurement (generalizability theory, item response theory) have emerged that address some known weaknesses of CTT (e.g., homoscedasticity of error along the test score distribution). However, the comparative simplicity of CTT and its continued utility in the development and description of assessments have resulted in CTT's continued use. Even when other test theories are used, CTT often remains an essential part of the development process.

Formal Definition

CTT relies on a small set of assumptions. The implications of these assumptions build into the useful CTT paradigm. The fundamental assumption of CTT is found in the equation

None

where X represents an observed score, T represents true score, and E represents error of measurement.

The concept of the true score, T is often misunderstood. A true score, as defined in CTT, does not have any direct connection to the construct that the test is intended to measure. Instead, the true score represents the number that is the expected value for an individual based on this specific test. Imagine that a test taker took a 100-item test on world history. This test taker would get a score, perhaps an 85. If the test was a multiple-choice test, probably some of those 85 points were obtained through guessing. If given the test again, the test taker might guess better (perhaps obtaining an 87) or worse (perhaps obtaining an 83). The causes of differences in observed scores are not limited to guessing. They can include anything that might affect performance: a test taker's state of being (e.g., being sick), a distraction in the testing environment (e.g., a humming air conditioner), or careless mistakes (e.g., misreading an essay prompt).

Note that the true score is theoretical. It can never be observed directly. Formally, the true score is assumed to be the expected value (i.e., average) of X, the observed score, over an infinite number of independent administrations. Even if an examinee could be given the same test numerous times, the administrations would not be independent. There would be practice effects, or the test taker might learn more between administrations.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading