Skip to main content icon/video/no-internet

A histogram is a method that uses bars to display count or frequency data. The independent variable consists of interval- or ratio-level data and is usually displayed on the abscissa (x-axis), and the frequency data on the ordinate (y-axis), with the height of the bar proportional to the count. If the data for the independent variable are put into “bins” (e.g., ages 0–4, 5–9, 10–14, etc.), then the width of the bar is proportional to the width of the bin. Most often, the bins are of equal size, but this is not a requirement. A histogram differs from a bar chart in two ways. First, the independent variable in a bar chart consists of either nominal (i.e., named, unordered categories, such as religious affiliation) or ordinal (ranks or ordered categories, such as stage of cancer) data. Second, to emphasize the fact that the independent variable is not continuous, the bars in a bar chart are separated from one another, whereas they abut each other in a histogram. After a bit of history, this entry describes how to create a histogram and then discusses alternatives to histograms.

A Bit of History

The term histogram was first used by Karl Pearson in 1895, but even then, he referred to it as a “common form of graphical representation,” implying that the technique itself was considerably older. Bar charts (along with pie charts and line graphs) were introduced over a century earlier by William Playfair, but he did not seem to have used histograms in his books.

Creating a Histogram

Consider the hypothetical data in Table 1, which tabulates the number of hours of television watched each week by 100 respondents. What is immediately obvious is that it is impossible to comprehend what is going on. The first step in trying to make sense of these data is to put them in rank order, from lowest to highest. This says that the lowest value is 0 and the highest is 64, but it does not yield much more in terms of understanding. Plotting the raw data would result in several problems. First, many of the bars will have heights of zero (e.g., nobody reported watching for one, two, or three hours a week), and most of the other bars will be only one or two units high (i.e., the number of people reporting that specific value). This leads to the second problem, in that it makes it difficult to discern any pattern. Finally, the x-axis will have many values, again interfering with comprehension.

Table 1 Fictitious Data on How Many Hours of Television Are Watched Each Week by 100 People
SubjectsData
1–54143143531
6–10392293249
11–15122753723
16–202922222614
21–253334121316
26–30342540541
31–354330404412
36–405514253210
41–45302825230
46–505624171533
51–553015292014
56–604026243449
61–655026133647
66–70199643533
71–75353992541
76–80518541159
81–853636375229
86–902422413631
91–953210504523
96–100241552052
Table 2 The Data in Table 1 Grouped into Bins
IntervalMidpointCountCumulative total
0–4211
5–9778
10–14121220
15–1917727
20–24221340
25–29271252
30–34321466
35–39371016
40–44421086
45–4947490
50–5452696
55–5957399
60–64621100

The solution is to group the data into mutually exclusive and collectively exhaustive classes, or bins. The issue is how many bins to use. Most often, the answer is somewhere between 6 and 15, with the actual number depending on two considerations. The first is that the bin size should be an easily comprehended size. Thus, bin sizes of 2, 5, 10, or 20 units are recommended, whereas those of 3, 7, or 9 are not. The second consideration is esthetics; the graph should get the point across and not look too cluttered.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading