Skip to main content icon/video/no-internet

Correspondence analysis is a method for interpreting tabular data visually in the form of spatial maps in which the rows and columns of the table are depicted as points. Two forms of the method are common: simple-correspondence analysis (CA) and multiple-correspondence analysis (MCA). Cross-tabulations and raw categorical data in the social sciences are prime examples for being visualized by CA. Typically, a cross-tabulation is subjected to a test for association between the row and column variables using a chi-square test, for example. By contrast, CA visualizes the actual structure of this association, whether it is statistically significant or not. MCA generalizes this method to many variables, usually questions in a survey, showing how the response categories interrelate. In this entry, each approach is explained using data from an international survey on the role of government.

The theory underlying CA has its origins, as early as the 1940s, in the scaling of categorical variables (i.e., assigning numerical scores to their category levels) to achieve an objective such as maximizing their pairwise correlations or maximizing between-row or between-column variances. The geometric approach along with its name is derived from the French term analyse des correspondances, developed and popularized by Jean-Paul Benzécri and colleagues starting in the early 1960s. It is this approach that is presented here.

Simple Correspondence Analysis

As a first example, consider data from the International Social Survey Programme's Role of Government survey in 2006, in particular a question concerning government's contribution to the public health system. Respondents were asked whether government should pay “much more,” “more,” “same as now,” “less,” or “much less” for health services, and the responses were cross-tabulated with the respondents' interest in politics, in five levels (from 1 = very much interested to 5 = not at all interested). Table 1 shows these counts for three different samples, from France and the former West and East Germany, the latter still being kept separated for research purposes. In this table, the response categories “less” and “much less” have been combined because of very low frequencies of the latter response to the question. To simplify the present analysis, respondents with missing values have been removed. Missing data such as “can't choose,” “don't know,” or “refused to answer” can be coded as an additional level, either combined or separately, but these nonsubstantive responses often dominate the results, obscuring the interpretation of the substantive responses (Michael Greenacre & Jorg Blasius, 2006).

The first thing to note about CA is that it depicts the relative frequencies of response, called profiles. For example, the profile of French respondents with a high interest in politics (row F1 of Table 1) is [73/229, 73/229, 65/229, 18/229] = [0.319, 0.319, 0.284, 0.079]—hence, a profile is a set of proportions summing to 1. The profiles of the French sample as a whole and the two German samples, as well as the respondents all together, called average profiles, are given in the last rows of Table 1 in bold italics, and the data for group F1 can be compared with these averages. Thus, compared with all French, those with a high interest in politics are proportionally higher in number in saying that government should spend much more on health (0.319 compared with 0.230); and they are proportionally fewer in saying that they should spend the same as now (0.284 compared with 0.325). This pattern is similar when compared with the total sample.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading