Skip to main content icon/video/no-internet

Test–Retest Reliability

Test–retest reliability is one way to assess the consistency of a measure. The reliability of a set of scores is the degree to which the scores result from systemic rather than chance or random factors. Reliability measures the proportion of the variance among scores that are a result of true differences. True differences refer to actual differences, not measured differences. That is, if you are measuring a construct such as depression, some differences in scores will be caused by true differences and some will be caused by error. For example, if 90% of the differences are a result of systematic factors, then the reliability is.90, which indicates that 10% of the variance is based on chance or random factors. Some examples of chance or random errors include scoring errors, carelessness on the part of the respondent (e.g., not clearly marking an answer), and outside distractions on the day the test is administered (e.g., someone talking loudly near the testing room). Determining the exact true score for each subject is not possible; however, reliability can be estimated in several ways. Each method of determining reliability has advantages and disadvantages. This entry describes the ways to measure reliability and discusses the assumptions and considerations relevant to the application of test–retest reliability.

Methods

One way to measure reliability is to determine the internal consistency of a measure. If the various components or items of an instrument are measuring the same construct, then the scores on the components or items will tend to covary. That is, if the instrument is all keyed in the same direction, then people that are high on the construct (e.g., extroversion) will tend to answer all the items in one direction, and people low on the construct (e.g., those who are not extroverted) will tend to answer all the items in the opposite direction. Cronbach's alpha coefficient and split-half reliability are estimates of internal consistency reliability.

A second common estimate of reliability examines the stability of the scores. One way to measure the stability of a measure is to compare alternate or parallel forms of the instrument. High-alternate-form coefficients indicate the forms are comparable and reliable. Test–retest reliability is the most common way to measure the stability of a measure over time. It is conceptually and intuitively the simplest approach and one that most closely corresponds to the view of reliability as the consistency or repeatability of a measure. That is, if a researcher has the same people take the same test on more than one occasion (i.e., a first test occasion and a retest occasion), the correlation between each test administration will be the test–retest reliability. The test is thought of as parallel with itself. Some authors refer to the test–retest correlation obtained as the coefficient of stability.

With tests of achievement (e.g., a math test), aptitude (e.g., intelligence tests), and personality (e.g., a test of extroversion/introversion), the measure is typically administered only twice, so that only one estimate of the reliability coefficient is obtained. If the measure is administered several times, then the usual practice is to calculate the average or mean of the intercorrelations among the scores obtained on the various occasions as the estimate of the test–retest reliability coefficient.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading