Skip to main content icon/video/no-internet

Introduction

Up until 25 years or so ago, item analysis was straightforward: multiple-choice test items were field-tested on reasonably sized samples of examinees to determine their level of difficulty and discrimination, and distractors were evaluated to determine their effectiveness in attracting examinees who were without the appropriate knowledge required for successfully answering the test items (see Crocker & Algina, 1986; Gulliksen, 1950; Lord & Novick, 1968). Items that were too easy or too hard, or less discriminating than other test items available to the test developer, were less likely to be selected for the final version of a test. In the 1970s, criterion-referenced tests were introduced into the testing field, and item analysis for these tests became less focused on determining levels of item difficulty and discrimination because these item statistics were relatively unimportant in the criterion-referenced test development process. Item congruence with the objectives they were designed to measure became one of the determining factors for item selection. Item difficulties of items measuring the same objective were used to identify potentially flawed items rather than to assess item difficulty per se. Outliers among the item difficulties were helpful in flagging potentially flawed test items. Identifying items with negative or very low item discrimination indices became important but that was about all that was important about item discrimination indices for constructing criterion-referenced tests. Clearly the use of item statistics with criterion-referenced test development was different from norm-referenced test development.

In the 1970s, modern test theory, perhaps better known as ‘item response theory (IRT)’, was introduced into the testing field and the item statistics of interest were different from the classical item statistics and also depended upon the choice of test model (Hambleton, Swaminathan & Rogers, 1991; Lord, 1980; Wright & Stone, 1979). Even the number of item statistics available to the test developer was dependent on the choice of IRT model. Modern test theory was very much focused at the item level as a strategy for gaining more flexibility in the test development process. At the same time, modern test theory is associated with stronger modelling of the item response data. Advantages, in principle, accrue from such an approach, but these advantages only come when the models being applied fit the data (e.g. the one-, two-, and three-parameter logistic test models). Model-data fit then is a critical element of modern test theory. IRT item statistics have the attractive feature that they are invariant across samples of examinees from the population of examinees for whom the test under construction is intended and this item invariance property is a major advantage to test developers. After statistically adjusting item statistics for differences in examinee samples, item statistics can be compared and contrasted, though the examinee samples on which they were based can be quite different.

One other major change in assessment has taken place that impacts strongly on item analysis practices today. Today, it is common to use performance test items that are scored polytomously. There are no multiple-choice item distractors needing to be evaluated. But, item statistics for assessing difficulty and discrimination that can be applied to polytomous response data have become important.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading