Skip to main content icon/video/no-internet

Reliability refers to the consistency and repeatability of a measurement when the testing procedure is repeated on a population of individuals or groups. Knowing the reliability of particular assessments is particularly important for instructors who use standardized measures to assess the curriculum. Curricularists should ask about the reliability of measurement tools that they are expected to use in the classroom or within the school to determine its applicability. The usefulness of this score presupposes that individuals or groups exhibit some degree of stability in their behaviors. However, behaviors among the same person are rarely the same. Scores from an instrument should be stable; a higher degree of stability indicates higher reliability because the results are repeatable. The American Psychological Association has defined reliability as the degree to which observed scores are “free from errors of measurement.” The measure of error that results limits the extent to which results are generalizable. Different types of reliability estimates can be calculated through specific methods.

Reliability is merely an estimate rather than an exact calculation; thus, it is not possible to calculate reliability exactly. Reliability estimates rank along a continuum on a scale from zero to one. A reliability estimate of zero indicates that the measure is completely unreliable. A reliability estimate of one indicates that the measure is completely reliable. The reliability estimate represents the proportion of variability of a measure that is related to the true score. For example, a reliability estimate of .7 means that the measure is about 70% true and about 30% random error.

The critical information that should be reported on reliability includes the identification of major sources of errors, the size of those errors and the degree of generalizability of scores across alternate forms, administrations, or relevant dimensions. Variance or standard deviations of measurement errors, in terms of one or more coefficients, or in terms of item response theorybased test information functions should also be reported. Generally, three types of reliability estimates are reported: testretest, parallel forms, and internal consistency. Testretest is used to assess the consistency of a measure when it is administered at different times. Parallel forms, or alternative forms, are used to assess the consistency of tests that are designed in the same way from the same content domain and are administered during independent testing sessions. Internal consistency is used to assess the relationships across items or subsets of items within a test during a single test administration. A widely used reliability estimate is Cronbach's alpha, which provides an index of internal consistency.

Each type of reliability estimate has its own strengths and weaknesses. Those factors need to be considered when designing a study because of their potential impact on the reliability estimates chosen. Testretest reliability is often used in studies with a pretest and posttest design with no control group. However, one disadvantage of this experimental design is that reliability is not estimated until after the posttest has been conducted. If the reliability is too low, this result will affect the meaningfulness and usability of the scale. Parallel forms are used when a researcher is administering two similar instruments. However, the administration of two similar instruments for more complex or subjective constructs can complicate interpretations. Coefficients based on calculating the relationships between test items and subsets of items are not without limitations. Reliability coefficients are typically useful in comparison tests of measurement procedures. However, these comparisons are not usually straightforward. Although a coefficient may show error because of scorer inconsistencies, it may not reflect variation that is indicative of succession of examinee performance or products. A coefficient may demonstrate the internal consistency of the instrument but may not reflect measurement errors associated with the examinee's motivation, efficiency, or health. Thus, when assessing constructs using multiple measures that result in reliability estimates, testing should be conducted in a short period in which individuals' attributes are likely to remain stable.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading