Skip to main content icon/video/no-internet

Cohen's Kappa

Cohen's Kappa coefficient (κ) is a statistical measure of the degree of agreement or concordance between two independent raters that takes into account the possibility that agreement could occur by chance alone.

Like other measures of interrater agreement, κ is used to assess the reliability of different raters or measurement methods by quantifying their consistency in placing individuals or items in two or more mutually exclusive categories. For instance, in a study of developmental delay, two pediatricians may independently assess a group of toddlers and classify them with respect to their language development into either “delayed for age” or “not delayed.” One important aspect of the utility of this classification is the presence of good agreement between the two raters. Agreement between two raters could be simply estimated as the percentage of cases in which both raters agreed. However, a certain degree of agreement is expected by chance alone. In other words, two raters could still agree on some occasions even if they were randomly assigning individuals into either category.

In situations in which there are two raters and the categories used in the classification system have no natural order (e.g., delayed vs. not delayed; present vs. absent), Cohen's κ can be used to quantify the degree of agreement in the assignment of these categories beyond what would be expected by random guessing or chance alone.

Calculation

Specifically, k can be calculated using the following equation:

None

where Po is the proportion of the observed agreement between the two raters, and Pe is the proportion of rater agreement expected by chance alone. A κ of +1 indicates complete agreement, whereas a κ of 0 indicates that there is no agreement between the raters beyond that expected by random guessing or chance alone. A negative κ indicates that the agreement was less than expected by chance, with κ of −1.0 indicating perfect disagreement beyond what would be expected by chance.

None
None

To illustrate the use of the equation, let us assume that the results of the assessments made by the two pediatricians in the above-mentioned example are as shown in Table 1. The two raters agreed on the classification of 90 toddlers (i.e., Po is 0.90). To calculate the probability of the expected agreement (Pe), we first calculate the probability that both raters would have classified a toddler as delayed if they were merely randomly classifying toddlers to this category. This could be obtained by multiplying the marginal probabilities of the delayed category, that is, (23÷100)× (27÷100) = 0:062: Similarly, the probability that both raters would have randomly classified a toddler as not delayed is (77÷100) × (73÷100) = 0.562: Therefore, the total agreement expected by chance alone (Pe) is 0.562 + 0.062 = 0.624. Using the equation, κ is equal to 0.73.

Richard Landis and Gary Koch have proposed the following interpretation for estimates of κ (Table 2). Although arbitrary, this classification is widely used in the medical literature. According to this classification, the agreement between the two pediatricians in the above example is “good.”

It should be noted that κ is a summary measure of the agreement between two raters and cannot therefore be used to answer all the possible questions that may arise in a reliability study. For instance, it might be of interest to determine whether disagreement between the two pediatricians in the above example was more likely to occur when diagnosing developmental delay than when diagnosing normal development or vice versa. However, κ cannot be used to address this question, and alternative measures of agreement are needed for that purpose.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading