Skip to main content icon/video/no-internet

Correlation is a statistical measure of the association between two or more variables. Two or more variables are associated if they change together (covary). In statistics, such simultaneous movement is called dependence. Thus, correlation is simply an indication of a lack of independence between variables. Measures of correlation can also indicate the direction of the relationship between variables—that is, whether it is positive or negative. In addition, while correlation alone is insufficient to establish a causal relationship, it often illuminates the potential for such. Below, the basic properties and applications of current correlation measures are discussed.

The modern statistical concept of correlation was initially put forward by Sir Francis Galton in a series of papers on heredity in the late 1800s. Correlation was subsequently refined in various works by Karl Pearson and G. Udny Yule. These initial works culminated in a statistic now called the Pearson product-moment correlation coefficient, Pearson's r. Pearson's r is perhaps the most widely used of all statistics because of its value as an indicator of correlation and its relationship to multivariate analysis, in particular linear regression. However, it is only one of several indicators of statistical correlation. Today, the concept of correlation is a broad term that comprises a host of indices used to measure both parametric and nonparametric association. Of the latter, Kendall's τ and τb, Cramér's V, and Goodman and Kruskal's γ are especially prevalent in applied statistics.

Pearson's r indicates both the degree and direction of linear dependence between variables. Direction in a linear relationship can take one of two forms: positive or negative. For example, take two variables X and Y. If high scores on X correspond to low scores on Y, there is a negative relationship. If high scores on X correspond to high scores on Y, there is a positive relationship. Similarly, degree measures the strength of a relationship between variables. Such a relationship might be strong or weak, where a strong correlation means that large values of X are associated with large values of Y and vice versa.

Correlation may be visualized via scatter-plots—that is, by arranging the values of one variable on one axis and the corresponding values of another variable on another axis. In scatterplots, points falling close to a straight line imply strong correlation, whereas a cloud of points are more suggestive of a weak correlation or a lack of one. Similarly, if most points are in the bottom-left and top-right quadrants, a positive relationship is plausible, whereas if most points are in the top left and bottom right, a negative relationship is likely. A scatterplot may also detect nonlinear relationships between variables.

The Pearson r can be intuitively derived from a standardization of the covariance between variables. The covariance between X and Y is

None

where N is the sample size, and None and None are the means of the variables. Covariance combines the deviations computed using the scores from each variable and accounts for the sample size in the denominator.

However, covariance does not control for the potential for substantial differences in the amount of deviation within each variable. Thus, to derive Pearson's r, the covariance is adjusted by the deviation of each variable. Simply divide the covariance by the product of standard deviations as given

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading