Skip to main content icon/video/no-internet

Correspondence analysis is a method for interpreting tabular data visually in the form of spatial maps in which the rows and columns of the table are depicted as points. The basic form of the method visualizes cross-tabulations found typically in the social sciences, for example, education groups cross-tabulated

with political party voted for or a table of counts of the number of consumers that associate each of a set of brands with a set of attributes. Multiple correspondence analysis generalizes this method to many variables, typically questions in a survey, showing how the response categories interrelate.

As a first example, we use data from Tawnya Covert's article “Consumption and Citizenship during World War II: Product Advertisements in Women's Magazines,” a study in the Journal of Consumer Culture on consumption during the period of the Second World War after Pearl Harbor, as observed through a sample of advertisements aimed at American women. From their wording, the advertisements could be categorized into three types of advertising appeal: unrestricted consumption ads, rationing ads, deferred payment ads, and an additional fourth category gathering other appeals. Table 1 reproduces two tables from this article, stacked one on top of the other: cross-tabulations of product type by appeal and of year by appeal. Since there are no missing data, the column totals of the two tables are identical. The author interprets these data by calculating percentages in each row; for example, of the 540 advertisements for food, 439 (81.3%) correspond to unrestricted consumption, 11 (2.0%) to deferred spending, 82 (15.2%) to rationed supplies, and 8 (1.5%) to others. This type of table is perfect for the application of correspondence analysis, a method for visualizing count data.

Table 1 Cross-Tabulations of Product Type by Appeal and Year by Appeal; World War II Data
Unrestricted ConsumptionDeferred SpendingRationed SuppliesOther AppealsSum
Cosmetics219001220
Personal hygiene2541144273
Household1739154201
Baby6502168
Food43911828540
Small appliancess1112115
Large appliances1559772
Clothes994171121
Cigarettes4500045
Linens13918242
Mattresses1453022
Silverware0322034
Home decor45244376
Miscellaneous4245526118
Sum1410206173581847
1942126743140
1943549596518691
19445491128725773
1945186281712243
Sum1410206173581847
Source: Covert 2003, 327, 330.

Figure 1 Correspondence Analysis Map of the Cross-Tabulations in Table 1

None
Note: The map is determined by the first table, and the rows (years) of the second table are added as supplementary points.
Source: Based on data from Covert 2003, 327, 330.

Figure 1 is the correspondence analysis (CA) of the first table (product type by appeal), with the second table (years by appeal) also visualized as so-called supplementary, or passive, points. The basic properties of the method are explained through the interpretation of this map, followed by a description of the extension to multivariate categorical data, called multiple correspondence analysis (MCA).

Simple Correspondence Analysis

The simple form of correspondence analysis (CA) applies primarily to cross-tabulations such as those in Table 1. The method visualizes the information in the table by depicting the rows and columns as points in a spatial map (see Greenacre 2007). In the same way that this table is interpreted numerically by calculating proportions, or equivalently percentages, relative to the row (product) totals, so CA visualizes these sets of relative frequencies as points in a space to facilitate comparison of the products. The reason why silverware and large and small appliances are grouped together on the right side in Figure 1 is because their proportions across the appeal categories are similar. And the reason why personal hygiene, baby, cosmetics, et cetera, on the left side, are far from those on the right is because their proportions are quite different from those. In fact, the horizontal axis in this map coincides with the largest differences in the data set. The value 0.5302 on this axis quantifies how much of the total interproduct difference is “explained” by this axis, being 80.8% of that total. The vertical axis is not as important as the first, as shown by the value of 0.0694 (10.6% of total), but together they explain 91.4% of the interproduct differences. This percentage is analogous to the explained variance concept in multiple regression—the two axes can be considered two new variables, with values for the products equal to their coordinates, and these two variables predict the proportions in the data with an accuracy of 91.4%, with only 8.6% of the “variance” unexplained.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading