Interrater Reliability

Paul J.Lavrakas

doi:10.4135/9781412963947

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Interrater Reliability

Edited by:
Paul J. Lavrakas
In:Encyclopedia of Survey Research Methods
Chapter DOI:https://doi.org/10.4135/9781412963947.n236
Subject:Survey Research

Request Permissions

Show page numbers Hide page numbers

The concept of interrater reliability essentially refers to the relative consistency of the judgments that are made of the same stimulus by two or more raters. In survey research, interrater reliability relates to observations that in-person interviewers may make when they gather observational data about a respondent, a household, or a neighborhood in order to supplement the data gathered via a questionnaire. Interrater reliability also applies to judgments an interviewer may make about the respondent after the interview is completed, such as recording on a 0 to 10 scale how interested the respondent appeared to be in the survey. Another example of where interrater reliability applies to survey research occurs whenever a researcher has interviewers complete a refusal report form immediately after a refusal takes place and how reliable are the data that the interviewer records on the refusal report form. The concept also applies to the reliability of the coding decisions that are made by coders when they are turning open-ended responses into quantitative scores during open-ended coding.

Interrater reliability is rarely quantified in these survey examples because of the time and cost it would take to generate the necessary data, but if it were measured, it would require that a group of interviewers or coders all rate the same stimulus or set of stimuli. Instead, interrater reliability in applied survey research is more like an ideal that prudent researchers strive to achieve whenever data are being generated by interviewers or coders.

An important factor that affects the reliability of ratings made by a group of raters is the quantity and the quality of the training they receive. Their reliability can also be impacted by the extent to which they are monitored by supervisory personnel and the quality of such monitoring.

A common method for statistically quantifying the extent of agreement between raters is the intraclass correlation coefficient, also known as Rho. In all of the examples mentioned above, if rating data are not reliable, that is, if the raters are not consistent in the ratings they assign, then the value of the data to researchers may well be nil.

Paul J.Lavrakas

http://dx.doi.org/10.4135/9781412963947.n236

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

Entry

Reader's guide

Entries A-Z

Subject index

Interrater Reliability

Further Readings

Sign in to access this content

Get a 30 day FREE TRIAL

Sage Recommends

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Interrater Reliability

Further Readings

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends