Skip to main content icon/video/no-internet

Introduction

Item banks are used in a variety of contexts ranging from individual classrooms, schools, districts, state or other governmental units, to large scale computer-based testing programs. Typically, the purpose of developing an item bank is to assist, improve, and automate the test assembly process. In developing an item bank, a number of decisions about two factors need to be considered. These factors are the design of the bank and the methods for maintaining and refreshing it once it has been created.

Designing an item bank is analogous to creating a database. At the simplest level, the designer must decide what data elements to store and then how to structure the data in order to facilitate data extraction and reporting, test assembly, and possibly even test administration functions. For item banks these functions are realized through item selection and test assembly processes. Item selection and test assembly processes can be placed into two broad categories: one in which human intervention is heavily relied upon and another in which automation is heavily relied upon (e.g. through the use of computerized algorithms). Each of these approaches place differing requirements on an item bank. Ultimately, if a bank is to be used to assist humans in the assembly process, the challenges of building the bank are less difficult to meet. This is in contrast to the context in which tests must be administered directly from a bank without human intervention, requiring full automation of the test assembly process.

A typical item bank contains four classes of information about each item: (a) the actual item text and associated graphical or stimulus material, (b) some classification information about the item characterizing its non-statistical properties such as relevance to educational standards, cognitive processes required to produce a successful solution and content, (c) some form of statistical and performance data about the item, and (d) some representation of the history of an item's use.

Bank Design

Most automated test assembly algorithms rely on item statistics that have been placed on a common scale. Although transformations of the proportion correct (Gulliksen, 1950) and biserial correlations can be used, the most popular of these are based on Item Response Theory (IRT, Lord, 1980). In fact, the majority of literature on the topic of item banking has focused on methods and procedures for developing and maintaining an IRT scale. The interested reader might find the December 1996 volume of Applied Psychological Measurement, a special issue dedicated to item banking, helpful, and papers by Rudner (1998) and Flaugher (2000).

For traditional paper-and-pencil tests, assembled in advance of test administration, the amount of item classification data stored is relatively small. Although by no means incomplete, the data elements stored tend to be the minimum set required to guide a human assembly of a test with the added assumption that the test will be reviewed and revised before use. For a Quantitative measure, this might include:

  • Math Content – Arithmetic, Algebra, Geometry, or Calculus
  • Level of Context – Pure Math or Word Problem
  • Response Format – Multiple Choice, Short Answer, or Numeric Entry
  • Correct Answer

A number of features tend not to be stored. These include many aspects of the item's content that only became an issue with respect to other items assembled into a single test. For example, in a Verbal measure, the fact that a reading passage is about the works of Charles Dickens is typically not a feature that is stored or even explicitly considered. If two passages about Dickens are selected in a draft assembly, a human reviewer would note this and one of the passages would be replaced. As a second example, two Analogy items might rely on the test taker knowing the definition of ‘inflammable’. It is generally considered unacceptable practice to include multiple items in the same test that rely on specific vocabulary. Here again, key vocabulary is not typically an attribute that is stored for each item. Other types of interactions between items rely on global human impressions rather than extensive item classifications. For example, it might be found that a test is well within statistical specifications, but that a test reviewer has the impression that this collection of items is unusually time consuming to complete.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading