Reliability

Sarah Boslaugh

doi:10.4135/9781412953948

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Reliability

Edited by:
Sarah Boslaugh
In:Encyclopedia of Epidemiology
Chapter DOI:https://doi.org/10.4135/9781412953948.n398
Subject:Epidemiology & Biostatistics, Public Health (general), Public Health Research Methods
Keywords:correlation; instruments; instruments

Request Permissions

Show page numbers Hide page numbers

The issue of reliability (i.e., repeatability or reproducibility) is crucial in selecting and developing the most appropriate item, scale, or instrument. Reliability refers to the extent to which an instrument or the measurements of a test will consistently produce the same result, measure, or score if applied two or more times under identical conditions. A technique is reliable, or has achieved a high level of agreement, if it yields consistent results on repetition. If repeated measurements produce different results, and the entity being measured is assumed to not have changed, the instrument would be considered unreliable.

Methods of Assessing or Estimating Reliability

There are a variety of methods for estimating instrument reliability. DeVellis classifies these methods into two categories: (1) the type of instrument (observer or external source vs. self-report) and (2) time instrument applied or method (single administration or multiple administration). Reliability is estimated in one of four ways:

1.
Internal Consistency. This estimation is based on the correlation among the variables comprising the set or the homogeneity of the items comprising a scale (usually estimated with Cronbach's alpha).
2.
Split-Half Reliability. This estimation is based on the correlation of two equivalent forms of the scale (usually estimated with the Spearman-Brown coefficient).
3.
Test-Retest Reliability. This estimation is based on the correlation between scores from two (or more) administrations of the same item, scale, or instrument for different times, locations, or populations, when the two administrations do not differ in other relevant variables (usually estimated with the Spearman-Brown coefficient).
4.
Interrater Reliability. This estimation is based on the correlation of scores between/among two or more raters who rate the same item, scale, or instrument (usually estimated with intraclass correlation, of which there are six types discussed below).

These four reliability estimation methods are sensitive to different sources of error and are not necessarily mutually exclusive. Therefore, the reliability scores measured using these methods should not be expected to be equal nor need they lead to the same results. All reliability coefficients are forms of correlation coefficients and are thus sample dependent. In other words, another sample may well result in a different estimate.

Internal Consistency Reliability

Cronbach's coefficient alpha is the classic form of internal consistency reliability and is widely used as a measure of reliability. Cronbach's alpha can be interpreted as a measure of mean intercorrelation among item responses obtained at the same time. Cronbach's alpha is influenced by the number of items in a scale, so alpha will increase as the number of items in the scale increases, if new items have the same average intercorrelation of items. There are no absolute standards for knowing when reliability is adequate, but an often-used rule of thumb is that alpha should be at least .70 for a scale to be considered adequate, and many researchers require a cutoff of .80 for a ‘good scale.’

Cronbach's a is defined as

where N is the number of items, σ2X is the variance of the observed measure, and σ2Yi is the variance of sum of the items.

Cronbach's a is closely related to the correlation among items, and when evaluating whether an individual item should be retained in a scale, it is good to look at the squared multiple correlation, R2 for an item when it is predicted from all other items in the scale. The larger this R2, the more the item is contributing to internal consistency. The lower the R2, the more the researcher should consider dropping it. Note [Page 911]that a scale with an acceptable overall Cronbach's a may have some items with a low R2. The KuderRichardson (KR20) coefficient is a special version of Cronbach's a for items that are dichotomous.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Reliability

Methods of Assessing or Estimating Reliability

Internal Consistency Reliability

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends