Skip to main content icon/video/no-internet

The box-and-whisker plot, also called a boxplot, was invented by John Tukey. It is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile Q1; the median, the second quartile; and the third quartile Q3, with outliers plotted as individual data points. It is useful for revealing the central tendency and variability of a data set, the distribution (particularly symmetry or skewness) of the data, and the presence of outliers. It is also a powerful graphical technique for comparing samples from two or more different treatments or populations.

Although boxplots are usually generated using statistical software, they also may be constructed by hand, using the following steps:

  • Draw a rectangular box whose left edge is at Q1 and the right edge is at Q3. The box width is therefore the interquartile range IQR = Q3 − Q1. Draw a vertical line segment inside the box at the median.
  • Place marks at distances 1.5 times the IQR from either end of the box: These are the inner fences. Similarly, place marks for the outer fences at distances 3 times IQR from either end.
  • Extend a horizontal line segment (‘whiskers’) from each end of the box out to the most extreme observations that are still within the inner fences.
  • Represent values for mild outliers or observations between the inner and outer fences by circles. Represent values for extreme outliers or observations beyond the outer fences by asterisks.

The median is the middle value in the ordered data list. It is the number that divides the bottom 50% of thedatafromthetop50%. The median is also the second quartile Q2. Use the following steps to find the median of a data set:

  • Arrange the data from smallest to largest.
  • If the number of observations is odd, then the median is the observation exactly in the middle of the ordered list.
  • If the number of observations is even, then the median is the average of the two middle observations in the ordered list.

The first quartile Q1 is the median of the lower half of the ordered data, and the third quartile Q3 is the median of the upper half of the ordered data. If the number of observations is odd, the median of the entire data is included in both halves.

Example

Biological disturbances that are closely associated in adults suffering from endogenous depression (depression with no obvious external cause) are cortisol hypersecretion and shortened rapid eye movement (REM) period latency (the elapsed time from sleep onset to the first REM period). In a paper titled ‘Plasma cortisol secretion and REM period latency in adult endogenous depression,’ Gregory Asnis and colleagues reported on a comparison of REM period latency for patients with hypersecretion and patients with normal secretion. The data values are given below.

Hypersecretion Sample (n = 8)

  • 0.5, 1.0, 2.4, 5, 15, 19, 48, 83
  • minimum = 0.5
  • maximum = 83
  • median = 10
  • Q1 = 1.7
  • Q3 = 33.5
  • IQR = 31.8 1.5
  • (IQR) = 47.7

Normal Secretion Sample (n = 17)

  • 5, 5.5, 6.7, 13.5, 31, 40, 47, 47, 59, 62, 68, 72, 78, 84, 89,105, 180
  • minimum = 5
  • maximum = 180
  • median = 59
  • Q1 = 31
  • Q3 = 78
  • IQR = 47
  • 1.5(IQR) = 70.5

Figure 1, the boxplot representing these data, displays several interesting features. Each sample has a mild outlier and an upper tail rather longer than the corresponding lower tail. Normal secretion REM period latency values appear to be substantially higher than those for hypersecretion; this was confirmed by a formal analysis.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading