Skip to main content icon/video/no-internet

Item Response Theory

Item response theory (IRT) is a mental measurement theory based on the postulate that an individual's response to a test item is a probabilistic function of characteristics of the person and characteristics of the item. The person characteristics are the individual's levels of the traits being measured, and the item characteristics are features such as difficulty and discriminating power. Item response theory has several advantages over classic test theory and has the potential to solve several difficult measurement problems. The foundations of item response theory were developed in the early 20th century; however, it was Frederic Lord, beginning in the 1950s, who organized and developed the theory into a framework that could be applied to practical testing problems. Advances in computing were necessary to make the theory accessible to researchers and practitioners. Item response theory is now widely used in educational contexts by testing companies, public school systems, the military, and certification and licensure boards, and is becoming more widely used in other contexts such as psychological measurement and medicine. This entry discusses item response models and their characteristics, estimation of parameters and goodness of fit of the models, and testing applications.

Item Response Models

Item response theory encompasses a wide range of models depending on the nature of the item score, the number of dimensions assumed to underlie performance, the number of item characteristics assumed to influence responses, and the mathematical form of the model relating the person and item characteristics to the observed response. The item score might be dichotomous (correct/incorrect), polytomous as in multiple-choice response or graded performance scoring, or continuous as in a measured response. Dichotomous models have been the most widely used models in educational contexts because of their suitability for multiple choice tests. Polytomous models are becoming more established as performance assessment becomes more common in education. Polytomous and continuous response models are appropriate for personality or affective measurement. Continuous response models are not well known and are not discussed here.

The models that are currently used most widely assume that there is a single trait or dimension underlying performance; these are referred to as unidimensional models. Multidimensional models, although well-developed theoretically, have not been widely applied. Whereas the underlying dimension is often referred to as “ability,” there is no assumption that the characteristic is inherent or unchangeable.

Models for dichotomous responses incorporate one, two, or three parameters related to item characteristics. The simplest model, which is the one-parameter model, is based on the assumption that the only item characteristic influencing an individual's response is the difficulty of the item. A model known as the Rasch model has the same form as the one-parameter model but is based on different measurement principles. The Rasch theory of measurement was popularized in the United States by Benjamin Wright. The two-parameter model adds a parameter for item discrimination, reflecting the extent to which the item discriminates among individuals with differing levels of the trait. The three-parameter model adds a lower asymptote or pseudo-guessing parameter, which gives the probability of a correct response for an individual with an infinitely low level of the trait.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading