Item Response Theory

Mack C. Shelley

doi:10.4135/9781412939584

Entry
Entries A-Z
Subject index

Return to Entries

Item Response Theory

By: Mack C. Shelley
In:Encyclopedia of Educational Leadership and Administration
Chapter DOI:https://doi.org/10.4135/9781412939584.n302
Subject:Leadership, Educational Administration & Leadership (general)

Request Permissions

Show page numbers Hide page numbers

Item Response Theory

Modern advanced testing and measurement applications in psychometrics and educational measurement are informed by the basic ideas of item response theory (IRT), which remains a dynamic source of new strategies for improving the measurement of psychological constructs and behavior. IRT is the study of overall test results and the scores from individual test items, based [Page 528]on assumptions concerning the mathematical relationship between traits such as student ability and students' responses to individual test items. Student performance and ability are treated as latent, or unobserved, traits to be estimated from empirical data. IRT is contrasted to classical test theory, which developed in the 1920s and in the following decades in pursuit of ways to measure and test levels of intelligence. Unlike classical measurement, which emphasizes the overall score of a test instrument designed to measure student knowledge and achievement, IRT's interpretations of student performance are based upon the characteristics of the items comprising the test, rather than the aggregate test scores, and are predicated on model estimates produced by maximum likelihood, rather than more familiar least squares, methods such as Pearson product-moment correlation coefficients. IRT also relies upon the concept of item discrimination, that is, the extent to which an item is able to generate a higher proportion of correct responses with increasing age, ability, and knowledge base of the test recipient. An item that has greater discriminating power, and thus results in a greater rate of increase in the proportion of correct responses with age, has a steeper item characteristic curve, which provides a graphical summary of the relationship between a criterion variable and correct responses to a test item. Criterion variables are treated as latent traits of ability, such as intelligence, mathematical ability, or reading fluency. Ability scores are assumed to have a midpoint value of 0 and a unit of measurement equal to 1.

The development of IRT research dates back at least to the 1940s, to the work of D. N. Lawley (University of Edinburgh) showing that many of the elements of classical test theory could be expressed mathematically by the parameters of the item characteristic curve. F. M. Lord, of the Educational Testing Service, and Melvin Novick systematically defined, expanded, and explored IRT theory and developed the necessary computer programs for developing practical applications of IRT theory. Measurement models originally developed by Georg Rasch, a Danish mathematician, were generalized by B. D. Wright at the University of Chicago.

The relationship between student ability and the probability of a correct response to a given test item frequently follows an S-shaped curve, showing the probabilities of a correct response for students with different ability levels. The idea is that students with low ability are very unlikely to answer a question correctly, students with moderate ability have a sharply increased ability to answer correctly, and the additional ability of the most capable students does not increase the likelihood of a correct answer, which already has approached an asymptotic probability value close to the maximum of one.

Contemporary extensions of IRT include several uses of marginal maximum likelihood estimation. Marginal maximum likelihood is particularly appropriate for a new research agenda that includes (a) when the data include multiple, identifiably different groups who have taken an examination with the same instrument, and it is desired to compare the groups on a common measurement scale, (b) determining whether items in an exam or a survey questionnaire are processed differently by groups of examinees or respondents, (c) detecting when the parameters of an item drift over time and changing circumstances, and (d) estimating item parameters for measurement instruments with items based on different response models, such as a mixture of dichotomous and polytomous questions.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Entries A-Z

Subject index

Item Response Theory

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends