Skip to main content icon/video/no-internet

Polychoric Correlation Coefficient

The polychoric correlation coefficient, ρ, is used for correlation when the data consist of observations from two ordinal variables, each having an arbitrary number of response categories. Strictly speaking, the polychoric correlation coefficient estimates the correlation between two unobserved bivariate normal variables assumed to underlie the observed ordinal variables. The polychoric correlation coefficient is a generalization of the tetrachoric correlation coefficient, a statistic used to estimate correlation based on two dichotomous variables. Under assumptions, the polychoric correlation provides a correlation estimate that is entirely free of the attenuation caused when two normally distributed variables are “crudely categorized”—that is, when they are reduced to sets of ordinal categories. This attenuation in the correlation estimate leads to bias in parameter estimates when the biased correlations are used in methods such as factor analysis or structural equation modeling. By contrast, parameter estimates from analysis of polychoric correlations tend to be unbiased. Although the polychoric correlation coefficient is built upon an assumption of underlying bivariate normality, simulation studies show that the polychoric correlation coefficient is somewhat robust to violations of this assumption. The polychoric correlation coefficient will tend to yield a stronger (in absolute value) correlation estimate than a Pearson product-moment correlation applied to ordinal variables, especially when the number of response categories for each variable is small (less than five) and when the distributions of the ordinal variables are skewed. Yet the polychoric correlation coefficient, which traces its origins to Karl Pearson's work in the early 1900s, shares the same expected value as the Pearson product-moment correlation. By contrast, nonparametric correlation coefficients such as Spearman's rank-order correlation and Kendall's τb have different expected values, making them less attractive as substitutes for the Pearson product-moment correlation. The polychoric correlation coefficient has been used prominently for factor analysis and structural equation modeling of ordinal data. The statistic has definite advantages over some alternative approaches but also has substantial drawbacks. However, continuing innovation in structural equation modeling, as well as problems with the use of the polychoric correlation for this purpose, seem likely to make this application of the polychoric correlation coefficient less prominent in the future.

Estimating the Polychoric Correlation Coefficient

Imagine two variables X and Y, which might represent responses to two items on a questionnaire, where those responses were limited to a set of ordered and mutually exclusive categories. The items might be two Likert scale items, each with a set of response categories labeled strongly disagree, disagree, neither agree nor disagree, agree, and strongly agree. Researchers will often assign numbers to these categories, such as 1, 2, 3, 4, and 5. A researcher who then uses these numbers to compute statistics—such as a Pearson product-moment correlation between X and Y— is implicitly assuming that the variables X and Y have at least interval scale. If the variable has only ordinal scale, however, then the specific numbers assigned to different categories signify only the ordering of the response categories—they cannot be used for computation. Unlike the Pearson product-moment correlation, the polychoric correlation is derived, not computed from the response category scores.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading