Variance

Neil J.Salkind

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Variance

Edited by:
Neil J. Salkind
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n491
Subject:Research Design

Request Permissions

Show page numbers Hide page numbers

When describing a distribution of scores, one should use at least three indices: the shape of the distribution (e.g., unimodal, normal, and skewed), a measure of central tendency (e.g., mean and median), and a measure of the spread of scores. The variance is an example of the latter measure. The importance of a measure for the spread of scores can be seen in the following example: [Page 1607]

Both distributions have the same mean (

=

= 100), but the scores in distribution X cluster closer to the mean than those in distribution Y

Several measures can be used to describe the spread of scores. The range (highest score minus the lowest score) is simple and easy to understand but takes into account only the two outermost scores. One aberrant score can greatly affect the value of the range and give a false impression of how scores actually cluster together. The semi-interquartile range gets around this problem by considering only the central 50% of scores but ignores half the scores and is not a useful measure in inferential statistics. The most commonly used measures of spread of scores are the variance and the standard deviation. The standard deviation is merely the square root of the variance, and thus, it is the variance that is the important indicator.

The variance is commonly referred to as the average squared deviation from the mean. Its formula (using notation for a sample of scores, X)is

where

Capital S squared (S2) is the symbol for the variance;

(“X bar”) is the mean of the scores;

(X −

) indicates a deviation from the mean (how far away a score is from the mean);

The symbol ∑ (capital Greek letter sigma) is a direction “to sum” or “add”;

n is sample size; and

SS is the sum of the squared deviations from the mean (the numerator).

Notice several important aspects of the variance. The mean is the most commonly used measure of central tendency, and the variance is calculated by taking deviations from the mean. Thus, the variance shows how spread out scores are around the mean. Deviation scores are squared because the sum of the deviations from the mean, ∑(X −

), always equals zero. An interesting feature of the variance is that the sum of the squared deviations from the mean, ∑(X −

)2, is a smaller value than the sum of the squared deviations taken from any other score.

Note also that because the sum of the squared deviations from the mean is divided by n, the variance itself is a type of mean: the mean of squared deviation scores. Finally, like the mean of the scores, the variance takes every score into account. This is generally considered a desirable quality, but in very skewed distributions or distributions with a few very aberrant scores, one might wish to use another measure.

As an example, here is the calculation of the variance for distribution X. The mean is