Skip to main content icon/video/no-internet

Data are often reported using one kind of grouping (“aggregation”), while investigators need information about a different sort of grouping. Can one disaggregate and reaggregate the data—take the results apart and put them back together—by statistical methods? That is what ecological inference tries to do.

For example, take a voting rights case where Hispanic plaintiffs seek redistricting. The question is whether the non-Hispanic majority generally votes as a bloc to defeat the candidate preferred by the Hispanics. For any precinct, the number of votes for each candidate is a matter of public record. So is the number of Hispanic voters. However, the secret ballot prevents us from knowing how the non-Hispanics voted, or which candidate the Hispanics preferred. The vote totals are aggregated by precinct, but the votes need to be reaggregated by ethnic group.

Across precincts, there is a statistical relationship between the percentage of votes for each candidate and the ethnic makeup of the precinct. Under certain assumptions, that relationship can be used to infer the number of Hispanics and non-Hispanics who voted for the various candidates. As it turns out, the required assumptions generally cannot be tested using available data on precinct vote totals and numbers of Hispanic voters. The assumptions can be tested with exit polls, but then ecological inference would be unnecessary: one could make the estimates directly from the polling data.

A common method for ecological inference is “ecological regression.” This technique relies on the “constancy assumption,” that Hispanics vote alike no matter where they live. The same assumption applies to non-Hispanics. Demography is destiny; geography is accident. If the constancy assumption fails, ecological regression can make serious errors, as demonstrated in 1950 by W. S. Robinson. Using census data on states, he correlated the percentage of persons who were literate with the percentage who were foreignborn. Literacy rates were much higher in states with higher percentages of foreign-born persons.

According to ecological regression, the foreignborn must have been substantially more literate than the native-born. In fact, however, foreign-born persons were less literate. Nativity—born abroad or born in the United States—is analogous to ethnicity. Literacy is analogous to voting. The constancy assumption is that literacy depends on nativity, not state of residence.

What accounts for the statistical relationship in Robinson's state-level data? Literacy rates among the native-born varied substantially from one state to another. Furthermore, when foreign-born persons immigrated to the United States, they tended to settle in states where the native-born were relatively literate. That created a strong relationship between the percentage of foreign-born in a state and literacy rates among the native-born. The constancy assumption was wrong. That is why ecological regression gave the wrong answers.

Ecological inference runs into trouble in many contexts where the behavior of individuals is related to the demographics of their neighborhoods. On the other hand, ecological inference often succeeds. Researchers must make judgments case by case, focusing on the assumptions behind the methods.

David A.Freedman andPhilip B.Stark

Further Readings

Freedman, David A. (2001). “Ecological Inference

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading