Skip to main content icon/video/no-internet

Data Classification Schemes

In the graphic display of quantitative information, various classification schemes are used to organize data for meaningful interpretation. The human mind makes sense of the world by grouping and ordering what the senses perceive. The mind groups elements such as plants, animals, clothing, and so on in processes of qualitative judgment. It orders elements such as big trees, small trees, and so on in processes of quantitative judgment. Grouping and ordering are fundamental to classification schemes used to prepare data for mapping.

A set of data is a selected group of elements—a “population.” This population is first ordered (arranged along the number line from smallest value to largest) and then divided into classes of closely related numeric value. Each class is then represented on the map by a distinctive symbol. The process of assigning data elements to a class can be accomplished with simple arithmetic processes or with statistical processes.

Figure 1 Classification by natural breaks

None
Source: Cartography by Alex Feldman; data by U.S. Census American FactFinder.

Figure 2 Classification by equal interval

None
Source: Cartography by Alex Feldman; data by U.S. Census American FactFinder.

A simple, yet effective, arithmetic process is the division of the data by natural breaks. Once arranged along the number line, similar-value data elements are assigned a class. Each class is then assigned a specific color symbol and mapped. Using this classification scheme, a simple choropleth (value by place) map results (Figure 1).

The data can also be classed by equal interval (Figure 2). In this method, the data are divided into an arbitrary number of classes (percentiles, deciles, quintiles, etc.), and the same number data elements are grouped into each class. The arbitrary assignment into equal-interval classes, however, frequently obscures significant differences in data values. Compare Figures 1 and 2.

Another simple, yet effective, classification method is the interquartile range. In this method, the data are grouped into four equal intervals (quartiles). The middle intervals are collapsed into one class—the interquartile range. This interquartile range is assigned one symbol, while the first and last quartiles are assigned other symbols. The resulting map is a three-class map that emphasizes the lowest and highest values of the data set. The effectiveness of this method is seen in Figure 3, where the richest and poorest areas of Tennessee are easily identified.

For a more nuanced classification, the standard deviation is determined for each data element, and then groups of similar standard deviation are clustered into classes. While this is a preferred method for many mapping tasks, a thorough interpretation of the map is not possible for the reader with little knowledge of descriptive statistics. A map classed by standard deviation can still be a powerful communication tool if it is properly symbolized. As Figure 4 demonstrates, the areas of greatest and least poverty are still discernable to the reader aware that color differences represent value differences.

Figure 3 Classification by interquartile range

None
Source: Cartography by Alex Feldman; data by U.S. Census American FactFinder.

Figure 4 Classification by standard deviation

None
Source: Cartography by Alex Feldman; data by U.S. Census American FactFinder.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading