Skip to main content icon/video/no-internet

High-stakes testing refers to the practice of using educational or psychological tests to make decisions that have important consequences for the test takers. Common examples of high-stakes testing include the use of competency tests in particular subjects to determine whether students should be advanced from one grade to another; the use of aptitude tests to place students in particular educational programs, including those for gifted and talented students; the use of entrance examinations for educational institutions; and the use of tests of various sorts to select candidates for job training programs and to screen applicants for employment. Though high-stakes testing has come to refer almost exclusively to minimum competency testing in the schools—for grade advancement and high school graduation, as well as for governmental evaluation of schools—the broader sense of the topic is treated in this entry. Emphasis, however, is on high-stakes testing in education rather than in employment.

Standards and Standardization

The high stakes of testing guarantee that testing procedures and the tests themselves are subject to intense public and scholarly scrutiny, and justifiable concern arises about whether the tests are sufficiently reliable (that is, consistent in measurement), valid for the specific purposes to which they are put, and free of bias against particular populations of test takers, such as members of ethnic minority groups. The American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education have jointly published Standards for Educational and Psychological Testing, revised most recently in 1999, as guidelines for test development, standardization, and use. These guidelines go a long way toward ensuring fair and effective testing, but controversy regarding particular tests and testing practices nevertheless remains.

Most high-stakes tests measure cognitive constructs, such as knowledge of subject matter, verbal skills, and quantitative skills. Personality tests are sometimes used as well—for example, to screen out serious psychopathology or to determine whether job applicants have particular qualities relevant to job performance—but personality tests are unlikely to be used in isolation, unaccompanied by other, again typically cognitive, assessments. Tests of cognitive constructs are also most likely to be used with gifted and talented individuals, and then only in talent domains that lend themselves to cognitive assessment. Other talent domains, such as art, athletics, creative writing, dance, invention, and music, are not particularly amenable to testing. Product and performance samples, such as auditions, tryouts, portfolios, and competitions, are used instead to make high-stakes decisions.

Tests used to make high-stakes decisions, like most educational and psychological tests available for public use, are standardized for a particular population of potential test takers. Standardization means that scores on the test—or, more typically, scores on each of the various scales of the test—have been converted to a common metric in a common distribution of scores, usually the normal curve. This facilitates interpretation of scores on a given test and makes it possible to compare scores on different tests in a meaningful way. An easy way to understand standardization is in terms of percentiles, which is one sort of interpretation that may be made from standardized scores. If a score lies at the 50th percentile, this means that 50 percent of the population of potential test takers perform at or below that level—in other words, that the test taker with a score at the 50th percentile performs as well or better than 50 percent of the population. The standardization of a test is not conducted with an entire population of potential test takers. That population is necessarily hypothetical. Rather, standardization is conducted with a sample of that population, a sample that is large enough, statistically speaking, to allow inferential interpretation of the scores of anyone who takes the test. This sample is the norm group, and the distribution of scores for this sample produce the norms for the test. Separate norms are produced for men and women in virtually all educational and psychological tests. Some tests also have norms for groups distinguished by other variables, such as age or ethnicity.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading