Skip to main content icon/video/no-internet

Internal Consistency Reliability

Internal consistency reliability estimates how much total test scores would vary if slightly different items were used. Researchers usually want to measure constructs rather than particular items. Therefore, they need to know whether the items have a large influence on test scores and research conclusions.

This entry begins with a discussion of classical reliability theory. Next, formulas for estimating internal consistency are presented, along with a discussion of the importance of internal consistency. Last, common misinterpretations and the interaction of all types of reliability are examined.

Classical Reliability Theory

To examine reliability, classical test score theory divides observed scores on a test into two components, true score and error:

None

where X = observed score, T = true score, and E = error score.

If Steve's true score on a math test is 73 but he gets 71 on Tuesday because he is tired, then his observed score is 71, his true score is 73, and his error score is −2. On another day, his error score might be positive, so that he scores better than he usually would.

Each type of reliability defines true score and error differently. In test-retest reliability, true score is defined as whatever is consistent from one testing time to the next, and error is whatever varies from one testing time to the next. In interrater reliability, true score is defined as whatever is consistent from one rater to the next, and error is defined as whatever varies from one rater to the next. Similarly, in internal consistency reliability, true score is defined as whatever is consistent from one item to the next (or one set of items to the next set of items), and error is defined as whatever varies from one item to the next (or from one set of items to the next set of items that were designed to measure the same construct). To state this another way, true score is defined as the expected value (or long-term average) of the observed scores—the expected value over many times (for test-retest reliability), many raters (for interrater reliability), or many items (for internal consistency). The true score is the average, not the truth. The error score is defined as the amount by which a particular observed score differs from the average score for that person.

Researchers assess all types of reliability using the reliability coefficient. The reliability coefficient is defined as the ratio of true score variance to observed score variance:

None

where ρxx′ = the reliability coefficient, σ2T = the variance of true scores across participants, and σ2X = the variance of observed scores across participants.

Classical test score theory assumes that true scores and errors are uncorrelated. Therefore, observed variance on the test can be decomposed into true score variance and error variance:

None

where σ2E = the variance of error scores across participants.

The reliability coefficient can now be rewritten as follows:

None

Reliability coefficients vary from 0 to 1, with higher coefficients indicating higher reliability.

This formula can be applied to each type of reliability. Thus, internal consistency reliability is the proportion of observed score variance that is caused by true differences between participants, where true differences are defined as differences that are consistent across the set of items. If the reliability coefficient is close to 1, then researchers would have obtained similar total scores if they had used different items to measure the same construct.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading