Skip to main content icon/video/no-internet

The sample Pearson product-moment correlation coefficient (r) is a measure of the linear association between two independent continuous variables, namely X and Y, measured on the same individuals or units. The values of the Pearson correlation coefficient measures the strength of the linear relationship between X and Y, while the sign of the correlation coefficient indicates the direction of the relationship between X and Y.

Given two continuous variables, X and Y, the Pearson correlation coefficient, rXY, is obtained as the ratio of the covariance between the two variables over the product of the respective standard deviations.

None

where x and y are the sample means for the variables X and Y, respectively.

The correlation coefficient is defined only if both the standard deviations are finite and both of them are nonzero.

Assumptions

To be able to correctly interpret and make valid inferences about the Pearson correlation coefficient, the following assumptions must hold:

  • The observation x1, x2, …, xn and y1, y2, …, yn of φ and Y are independent and identically distributed.
  • The variables φ and Y are jointly normally distributed with means μX and μY, variances σ2X and σ2Y and correlation rXY.

Under these assumptions, the sample Pearson correlation coefficient rXY represents a valid estimate of the correlation rXY.

Properties

The Pearson correlation coefficient assumes values within the (−1; 1) range. A correlation coefficient equal to −1 indicates a perfect negative linear relationship between two variables (σee Figure 1a), while a correlation coefficient of 1 indicates a perfect positive linear relationship between two variables (Figure 1b). The correlation coefficient is equal to zero when either the two variables are independent (Figure 1c) or they are associated through a nonlinear relationship (Figure 1d).

Values in the middle of the (− 1; 1) range indicate the degree of linear dependence of the X and Y variables. A correlation coefficient >0 is called a positive correlation and indicates that the variables X and Y tend to increase or decrease together. A correlation coefficient <0 is called a negative correlation and indicates that increases in one variable correspond to decreases in the other. There are no rules on what defines a high or a low correlation, and the interpretation of the correlation coefficient depends on the context and the data on which it is calculated.

The Pearson correlation coefficient is not affected by changes in location or scale in either variable.

Although rXY can be used to determine the degree of association between two variables, it is not a measure of the causal relationship between X and Y.

The value of rXY can be affected greatly by the range of the data values, and extreme observations (outliers) can have dramatic effect on rXY. Thus, the full range of scores should always be used when calculating the correlation coefficient. Extreme observations should be treated with caution in the calculation of the correlation coefficient.

Figure 1 Correlation Coefficient Under Different Scenarios

None

Applications

The Pearson correlation coefficient can be used in a number of different applications.

  • Prediction. Knowing that a strong relationship exists between two variables allows one to make an accurate prediction about one of them using the other.
  • Validity. The Pearson correlation coefficient is often used to validate a new measurement scale. A high correlation between the new scale and an established one would assure that the new instrument is measuring what it is supposed to.
  • Reliability. The Pearson correlation coefficient may also be used to establish reliability of an instrument. A high correlation between successive measures on the same individuals, for example, would indicate that the instrument is reliable.

Hypothesis Testing about ρXY

When interest is in testing the null hypothesis that there is no linear association between two continuous variables (H0: ρXY = 0) against an alternative hypothesis that such association exists (HA: ρXY ≠ 0), then aStudent's t approximation can be used to test this hypothesis. Under the assumption that the distribution of X and Y is bivariate normal, the test

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading