Skip to main content icon/video/no-internet

Cross-tabulation is the statistical technique by which two or more discrete random variables (usually variables that can take only on a finite number of distinct values) are cross-classified to fully characterize their joint distribution and thereby the relations between them. The analysis of the resulting cross-classification of frequencies, or contingency table, is most often carried out to establish relations of (in)dependence between the variables and to gauge the strength of these relations, although several other objectives are equally well served with the use of cross-tabular analysis. This type of statistical analysis is relatively simple to understand, easy to implement, and well suited for a variety of interesting questions. As follows, its major features using a simple example from political science are discussed.

To illustrate the use of cross-tabular analysis, consider the following hypothetical example. A researcher is interested in understanding the cosponsorship decisions of legislators. From a random sample of legislators, she or he has collected data on the number of times legislators cosponsored with members of the parties they ran against in the previous election (i.e., their adversary parties) as well as a dichotomous measure of how electorally vulnerable legislators reported to have felt in the previous election. She or he is interested in knowing (a) whether these two variables are discernibly related and, if they are, (b) how sizable the effects of the relation are.

A two-way (i.e., including two variables), two-by-two (each taking on two distinct values) contingency table reports the results of her or his data collection (Table 1). In this case, variables C (for Column and Cosponsorship) and R (for Row and Reported vulnerability) were laid out in this particular order following the convention of placing the categories of the response variable in the columns and the categories of the explanatory variable in the rows. This is mere convention, and transposing the table would not lead to different conclusions. Three sample distributions can be derived that fully summarize the component variables: (1) the sample joint distribution of R and C, (2) the sample marginal distribution of R (or C), and (3) the sample conditional distribution of R given C (or C given R).

The joint distribution of R and C gives the probability of classifying an observation in a particular cell of the table and can be approximated by calculating the proportion of the data that fall into each cell; this is the sample joint distribution. The marginal distribution of R (or C) is the probability of classifying observations in a particular row (or column) of the table, and once more, it can be approximated by the sample marginal distribution of R (or C), obtained by summing across columns (or rows) and dividing by the total amount of observations. Finally, the conditional distribution of R (or C) comprises the probabilities of classifying an observation in a particular row (or column) given that they have been classified in a particular column (or row), and it can be approximated by dividing the proportion of observations being classified in a particular cell over the overall proportion of observations being classified in the corresponding column (or row), yielding the sample conditional probability of R given C (or C given R).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading