Skip to main content icon/video/no-internet

Numerical summaries used to describe a set of data generally include a measure of central tendency. While this provides a single estimate that describes where the data are located, it does not describe how spread out the data are about this central point. There are several numerical summaries that describe the variability in a data set. Four of the most common are the variance, the standard deviation, the interquartile range (IQR), and the range. They are illustrated in Table 1 using data collected on height from 22 subjects.

The Variance

The variance is approximately equal to the average squared distance of each observation about the mean and is generally denoted by s2. This is most easily seen in its formula, which is given by

None

where

xi represents the individual observation from the ith subject; the mean of the data is given by x;

σ is the summation sign, which indicates that you sum over everything that follows it. The limits below and above the sign indicate where you start, and stop, the summation, respectively. As it is written above, it says you should begin summing with the squared deviation of x1 from the mean and stop with the squared deviation of xn from the mean; and

n represents the number of observations in your data set.

For illustrative purposes, consider the data in Table 1 above. To compute the variance of the height measurements, do the following:

  • Calculate the mean height x.
  • For each observation, calculate the deviation from the mean xi − x.
  • Square the deviation of each observation from the mean (xi − xþ2.
  • Sum over all the squared deviations.
  • Divide this sum by n − 1, where n is the sample size.
Table 1 Measures of Variability for the Heights of 22 Students in an Introductory Statistics Class
ObservationHeight in InchesDeviation From MeanDeviation From Mean Squared
x161.05.227.04
x262.04.217.64
x363.03.210.24
x463.03.210.24
x564.02.24.84
x664.51.72.89
1
x765.01.21.44
x865.01.21.44
x965.01.21.44
x1065.01.21.44
x1166.00.20.04
x1266.00.20.04
x1366.00.20.04
x1467.00.80.64
x1567.00.80.64
x1667.00.80.64
x1768.01.83.24
x1868.01.83.24
x1969.02.87.84
x2069.53.310.89
x2172.05.833.64
x2274.07.860.84

Table 1 gives the quantities described in Steps 2 and 3 above for each observation. For this set of data the procedure above gives

None

Additional Notes about the Variance

  • The small n represents the sample size when computing the mean for a sample. When computing the mean for a population, you divide the sum of the squared deviations from the population mean and divide that sum by the population size N: This is done only if you have data from a census, that is, when you collected data from every member of a population.
  • The units of the variance are the square of the measurement units of the original data.
  • The variance is very sensitive to outliers. It is generally not the recommended measure of variability to use if there are outliers in the data or if the data are not symmetric.

The Standard Deviation

The standard deviation is the most commonly used measure of variability for data that follow a bell-shaped (or normal) distribution. It is generally denoted by s, and is simply the square root of the variance. Formally, it is given by the formula

None

where the quantities in the formula are defined in the same way as described above for the variance. If we consider the data in Table 1, we can calculate the standard deviation easily using the value that we calculated for the

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading