Skip to main content icon/video/no-internet

Sums of Squares

In mathematics, the sum of squares can refer to any set of numbers that has been squared and then added together (i.e., ΣX2). However, in the statistical analysis of quantitative data, the sum of squared differences or deviations (normally the difference between a score and the mean) is of particular interest. This formulation is usually referred to in research by the term sums of squares. This type of sum of squared values is extremely important in the analysis of the variability in data and in understanding the relationship between variables.

If there is no variability in a set of numbers, then they are all the same. However, this is unlikely to be the case in any research. Indeed, researchers actively investigate changes in their data resulting from experimental manipulations. So a procedure is required to produce a value to express the amount of variability within a data set. The sum of squares calculation achieves this outcome. For example, consider the numbers 2, 4, and 6. The range gives us a crude measure of the spread of these numbers: between 2 and 6 we have a range of 4, but this ignores the distribution of the scores within that range. More subtly, to indicate the variation within the data, we can compare each score to the mean (

None
), which in this case is 4, to produce a difference or deviation from the mean. So the deviations are 2 − 4 = −2, 4 − 4 = 0, and 6 − 4 = 2. It could be assumed that simply adding up these deviations would provide a useful measure of total variation in the data set. Unfortunately, this is not the case, as this total will always be zero, with the deviations of numbers below the mean canceling out the deviations of the numbers above the mean, as in the previous example, where −2 + 0 + 2 = 0. A solution to this problem is to square the deviations before they are summed, which always produces a positive nonzero value when there is some variability in the data. For the example the calculation is (−2)2 + 02 + 22 = 8. This is now a measure of the variability within the data, which can be expressed mathematically as Σ(X
None
), the sum of the squared deviations from the mean, or the sum of squares. This is an extraordinarily useful measure and is integral to most of the statistical techniques used to analyze quantitative data.

Sums of squares are absolutely critical to quantitative data analysis, as variability in the data can be calculated and attributed to different sources, which allows researchers to make appropriate judgments about the relationship between the variables under investigation.

Variance and Standard Deviation

The previous sum of squares formula lies at the heart of the most common statistic for expressing variability within research data, the standard deviation. With a population of numbers, the mean squared deviation can be calculated by dividing the sum of squares by the number of scores n. This mean square value is called the variance. However, in research, samples are normally selected to estimate population parameters so, more usually, the sum of squares is divided by the degrees of freedom (n − 1) to produce the

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading