Skip to main content icon/video/no-internet

A standardized test is one that is developed to maximize the comparability of scores by providing all examinees with the same (or parallel) content that is administered and scored in a consistent manner. The term standardized test is frequently used incorrectly as a synonym for a multiple-choice test, norm-referenced test, or commercially developed test. Standardized tests can be norm-referenced, criterion-referenced, or standards-based. They can contain one or more of a variety of different item types.

Standardization is not a binary concept. Tests can standardize fewer or more of the conditions of development, administration, and scoring and provide more or less specificity for each condition.

History

In approximately 2200 B.C.E., a formal system of civil service examinations was begun in China. There is no record of the content or methods used. In 1115 B.C.E., content domains were standardized to include music, archery, horsemanship, writing, and arithmetic. Over the years, the content domains changed somewhat, and in 606 B.C.E., the content and administration methods were further standardized into a system called Keju. The Keju system had three levels of competition: “Budding Geniuses,” “Promoted Scholars,” and “Ready for Office.” An example of the degree of standardization in this system is that for the “Promoted Scholars” competition, candidate essays were rewritten by a scribe and marked with a code so that examiners would not be able to recognize the author nor have their judgments affected by penmanship (although penmanship was judged explicitly for the “Budding Geniuses” competition).

More recent roots of standardized testing in the United States stem from Horace Mann's work as superintendent of schools for Boston, Massachusetts, around 1845, when he pushed to replace the tradition of oral examinations with essay testing. Mann compared the earlier, less standardized approaches to running a cross-country race where each runner was timed on the mile he ran and then the next runner started. Each runner would be subject to different conditions, some running on level ground, others running up hill, and others slogging through the mud.

Standardized essay testing became more and more popular at the beginning of the 20th century; however, there was a growing concern over the variations in scoring by different graders. E. L. Thorndike and his students had invented a number of objectively scored item types, including a precursor of the multiple-choice item for use in psychophysical research.

In 1915, Frederick Kelly pursued the development of objectively scored tests that reduced the time and effort for the administration and scoring of reading tests. His criteria for such items were that they should be subject to only one interpretation; call for only one thing; and be wholly right or wholly wrong, and not partly right and partly wrong. Following is the first known published selected response item, which appeared as a practice item on the Kansas Silent Reading Test. Other item types, particularly short answers, were used on this test.

Below are given the names of four animals. Draw a line around the name of each animal that is useful on the farm:

  • Cow
  • Tiger
  • Rat
  • Wolf

Frederick's invention influenced Arthur Otis, who used it for about half the questions in his “Scale for the Group Measurement of Intelligence.” In 1917, Otis was part of the group of psychologists called together to help the army deal with the problem of quickly and cost-effectively classifying (for training purposes) 1.7 million draftees. Their solution was the first all-selected-response test, the Army Alpha.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading