Skip to main content icon/video/no-internet

Correspondence Analysis

Correspondence analysis (CA) is a generalized principal component analysis tailored for the analysis of qualitative data. Originally, CA was created to analyze contingency tables, but CA is so versatile that it is used with a number of other data table types.

The goal of CA is to transform a data table into two sets of factor scores: one for the rows and one for the columns. The factor scores give the best representation of the similarity structure of the rows and the columns of the table. In addition, the factors scores can be plotted as maps, which display the essential information of the original table. In these maps, rows and columns are displayed as points whose coordinates are the factor scores and whose dimensions are called factors. It is interesting that the factor scores of the rows and the columns have the same variance, and therefore, both rows and columns can be conveniently represented in one single map.

The modern version of CA and its geometric interpretation comes from 1960s France and is associated with the French school of data analysis (analyse des données).

As a technique, it was often discovered (and rediscovered), and so variations of CA can be found under several different names, such as dual-scaling, optimal scaling, or reciprocal averaging. The multiple identities of CA are a consequence of its large number of properties: It can be defined as an optimal solution for many apparently different problems.

Notations

Matrices are denoted with uppercase letters typeset in a boldface font; for example, X is a matrix. The elements of a matrix are denoted with a lowercase italic letter matching the matrix name, with indices indicating the row and column positions of the element; for example, xi;j is the element located at the ith row and jth column of matrix X. Vectors are denoted with lowercase, boldface letters; for example, c is a vector. The elements of a vector are denoted with a lowercase italic letter matching the vector name and an index indicating the position of the element in the vector; for example ci is the ith element of c. The italicized superscriptT indicates that the matrix or vector is transposed.

Table 1 The Punctuation Marks of Six French Writers
WriterPeriodCommaAll Other Marks
Rousseau7,83613,1126,026
Chateaubriand53,655102,38342,413
Hugo115,615184,54159,226
Zola161,926340,47962,754
Proust38,177105,10112,670
Giraudoux46,37158,36714,299
Source: Adapted from Brunet, 1989.

An Example: How Writers Punctuate

This example comes from E. Brunet, who analyzed the way punctuation marks were used by six French writers: Rousseau, Chateaubriand, Hugo, Zola, Proust, and Giraudoux. In the paper, Brunet gave a table indicating the number of times each of these writers used the period, the comma, and all the other marks (i.e., question mark, exclamation point, colon, and semicolon) grouped together. These data are reproduced in Table 1.

From these data we can build the original data matrix, which is denoted X. It has I = 6 rows and J = 3 columns and is equal to

None

In the matrix X, the rows represent the authors and the columns represent types of punctuation marks. At the intersection of a row and a column, we find the number of a given punctuation mark (represented by the column) used by a given author (represented by the row).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading