Skip to main content icon/video/no-internet

Pearson Product-Moment Correlation Coefficient

In the late 19th century, Sir Francis Galton was measuring many different biological and sociological variables and describing their distributions using the extant methods for single variables. However, there was no way to quantify the degree of relationship between two variables. The Pearson product-moment correlation coefficient (hereafter referred to as “coefficient”) was created by Karl Pearson in 1896 to address this need. The coefficient is one of the most frequently employed statistical methods in the social and behavioral sciences and is frequently used in theory testing, instrument validation, reliability estimation, and many other descriptive and inferential procedures.

This entry begins by defining the coefficient and describing its statistical properties. Next, this entry discusses descriptive and inferential procedures. It closes with a discussion of the coefficient's limitations and precautions for interpreting.

Definition

The population coefficient, which is typically abbreviated as ρ, or the Greek letter rho, is an index of the degree and direction of linear association between two continuous variables. These variables are usually denoted as X (commonly labeled as a predictor variable) and Y (commonly labeled as an outcome variable). Note, however, that the letters are arbitrary; the roles of the variables implied by these labels are irrelevant because the coefficient is a symmetric measure and takes on the same value no matter how two variables are declared.

The population value is estimated by calculating a sample coefficient, which is denoted by

None
, “rho hat,” or r. The sample coefficient is a descriptive statistic; however, inferential methods can be applied to estimate the population value of the coefficient.

Statistical Properties of the Coefficient

The coefficient can take on values between −1 and 1 (i.e., −1 ≤ ρ ≤ 1). The absolute value of the coefficient denotes the strength of the linear association with |ρ| = 1 indicating a perfect linear association and ρ = 0 indicating no linear association. The sign of the coefficient indicates the direction of the linear association. When high scores on X correspond to high scores on Y and low scores on X correspond to low scores on Y the association is positive. When high scores on X correspond to low scores on Y and vice versa, the association is negative.

Bivariate scatterplots are often used to visually inspect the degree of linear association between two variables. Figure 1 demonstrates how an (X,Y) scatterplot might look when the population correlation between X and Y is negligible (e.g., ρ = .02). Note that there is no observable pattern in the dots; they appear to be randomly spaced on the plot.

Figure 2 demonstrates how an (X, Y) scatterplot might look when ρ = .90. There is an obvious pattern in the paired score points. That is, increases in X are associated with linear increases in Y at least on average.

Figure 3 demonstrates how an (X, Y) scatterplot might look when ρ = −.30. In this plot, increases in X are associated with linear decreases in Y at least on average. Due to the lower absolute value of the coefficient in this case, the pattern is not as easily identifiable.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading