Skip to main content icon/video/no-internet

Definition and Application Areas

Tests and questionnaires consist of a number of items, denoted J, that each measure an aspect of the same underlying psychological ability, personality trait, or attitude. Item response theory (IRT) models use the data collected on the J items in a sample of N respondents to construct scales for the measurement of the ability or trait.

The score on an item indexed j (j = 1,…, J) is represented by a random variable X that has realizations xj. Item scores may be

  • Dichotomous, indicating whether an answer to an item was correct (score xj = 1) or incorrect (score xj = 0)
  • Ordinal polytomous, indicating the degree to which a respondent agreed with a particular statement (ordered integer scores, xj = 0,…, m)
  • Nominal, indicating a particular answer category chosen by the respondent, as with multiple-choice items, where one option is correct and several others are incorrect and thus have nominal measurement level
  • Continuous, as with response times indicating the time it took to solve a problem

Properties of the J items, such as their difficulties, are estimated from the data. They are used for deciding which items to select in a paper- and-pencil test or questionnaire, and more advanced computerized measurement procedures. Item properties thus have a technical role in instrument construction and help to produce high-quality scales for the measurement of individuals.

In an IRT context, abilities, personality traits, and attitudes underlying performance on items are called latent traits. A latent trait is represented by the random variable θ, and each person i(i = 1,…, N) who takes the test measuring θ has a scale value θi. The main purpose of IRT is to estimate θ for each person from his or her J observed item scores. These estimated measurement values can be used to compare people with one another or with an external behavior criterion. Such comparisons form the basis for decision making about individuals.

IRT originated in the 1950s and 1960s (e.g., Birnbaum, 1968) and came to its full bloom afterwards. Important fields of application are the following:

  • Educational measurement, where tests are used for grading examinees on school subjects, selection of students for remedial teaching and follow-up education (e.g., the entrance to university), and certification of professional workers with respect to skills and abilities
  • Psychology, where intelligence measurement is used, for example, to diagnose children's cognitive abilities to explain learning and concentration problems in school, personality inventories to select patients for clinical treatment, and aptitude tests for job selection and placement in industry and with the government
  • Sociology, where attitudes are measured toward abortion or rearing children in singleparent families, and also latent traits such as alienation, Machiavellianism, and religiosity
  • Political science, where questionnaires are used to measure the preference of voters for particular politicians and parties, and also political efficacy and opinions about the government's environmental policy
  • Medical research, where health-related quality of life is measured in patients recovering from accidents that caused enduring physical damage, radical forms of surgery, long-standing treatment using experimental medicine, or other forms of therapy that seriously affect patients' experience of everyday life
  • Marketing research, where consumers' preferences for products and brands are measured

Each of these applications requires a quantitative scale for measuring people's proficiency, and this is what IRT provides.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading