Skip to main content icon/video/no-internet

Test Scores, Validity and Validation of

Validity of test scores, in the very broadest and most commonly accepted sense, refers to the degree to which the interpretation of the meaning of a test score, within the context in which the score is used, is appropriate and justified. Different perspectives of validity and approaches to validation can be found within various validity theories and models. The dominant approach to the validation of test scores today is the concepts and methods set forth in the 1999 Standards for Educational and Psychological Testing (Standards). The focus of this approach is the accumulation of sufficient evidence, whether statistical, judgmental, or otherwise, to support a particular score interpretation and use. Recently, Michael Kane's argument-based conceptual framework and approach to validation, which supplements the 1999 Standards, has gained popularity. Important emerging alternative validity theories include Denny Borsboom's ontological view, which challenges the basic tenets of the dominant approach, and Pamela Moss's hermeneutics and sociocultural approach, which advocates for an expansion of the validation process to allow for situated validity inquiries.

Standards for Validation

The gold standard for evaluating the validity of test scores has been the 1999 Standards set forth by the Joint Committee representing the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education, with support from about 40 other professional organizations. The 1999 Standards defines validity as the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. The validation process would begin with a statement of the proposed interpretation of results from a test. Implicit in this step of the validation process is the assumption that there is some rationale for the proposed interpretation of test results. A description of the type of material to be covered on a test should be included in the rationale for the proposed interpretation. From the rationale, a framework should emerge that will illustrate the content that should be covered by the test and, consequently, the types of items or tasks that should be used on the test. Since there are many different types of tests and circumstances, the type of evidence that would be most compelling and illuminative will differ from situation to situation. The 1999 Standards state that a series of propositions must be used to support the proposed interpretation of a test's results. It follows that to show support for the series of propositions, evidence will have to come from a variety of different sources.

The central tenet of the 1999 Standards is that validity is established through the accumulation of evidence to support score interpretation. Evidence can be gathered from many different sources. However, five particular sources have been prominent historically.

  • Evidence based on test content refers to how well the items or tasks in a test are representative and relevant to the domain of the construct or trait being tested. For instance, the items in an employment test might be evaluated to determine if they are representative and relevant to the performance of the job. Evidence from this source is generally gathered via systematic expert judgments.
  • Evidence based on response processes refers to evidence that test takers arrive at responses through a physical or mental process that is consistent with the nature of the intended construct, and not through some other irrelevant processes. An example is that for a math reasoning test, we gather evidence that the students do not arrive at their answers through rote memorization of algorithms, but through mathematical reasoning. Evidence from this source is generally gathered via observations and analyses of individual responses, including such methods as think-aloud and cognitive interviews.
  • Evidence based on internal structure focuses on the ways that test items, tasks, and components are related to each other in a manner that is consistent with the theoretical internal structure of the construct being measured. Comparing the dimensionality of the test to the proposed dimensionality of the construct via factor analyses is one common way of gathering evidence of this type.
  • Evidence based on relations to other variables refers to evidence that the various relations between test scores and other external variables are consistent with the theoretically expected relations between the intended construct and these external variables. There are many statistical as well as nonstatistical methods in which relations to other variables can be measured or evaluated. Some of the common methods include the establishment of evidence of convergent and discriminant validity, test–criterion relations, predictive validity, and sensitivity and specificity.
  • Finally, evidence based on consequences refers to the consequences of the interpretation and use of test scores and score-based decisions. We seek evidence that the test has indeed performed the socioeducational function as intended; and evidence that the interpretation and use of the test scores in a particular manner has not led to unintended adverse effects. Evidence of this type is obtained via observations of effects of the use of tests as well as via social dialogues and debates in which intended and unintended direct and indirect social or educational outcomes are examined.

The exact mix of sources and amount of evidence needed to support a particular score interpretation is not prescribed by the 1999 Standards. Rather, they are considered matters of professional judgment for each given situation.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading