Normalizing Data

Herv&amp;#233; Abdi; Lynne J. Williams

doi:10.4135/9781071812082

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Normalizing Data

By: Hervé Abdi & Lynne J. Williams
In:The SAGE Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781071812082.n400
Subject:Research Methods & Evaluation (general), Research Design
Keywords:standard deviations; standard score

Request Permissions

Show page numbers Hide page numbers

Researchers often want to compare scores or sets of scores obtained on different scales. For example, how do we compare a score of 85 in a cooking contest with a score of 100 on an IQ test? To do so, we need to “eliminate” the unit of measurement; this operation means to normalize the data. There are two main types of normalization. The first type of normalization originates from linear algebra and treats the data as a vector in a multidimensional space. In this context, to normalize the data is to transform the data vector into a new vector whose norm (i.e., length) is equal to one. The [Page 1095]second type of normalization originates from statistics and eliminates the unit of measurement by transforming the data into new scores with a mean of 0 and a standard deviation of 1. These transformed scores are known as z scores.

Normalization to a Norm of One

The Norm of a Vector

In linear algebra, the norm of a vector measures its length, which is equal to the Euclidean distance of the endpoint of this vector to the origin of the vector space. This quantity is computed (from the Pythagorean theorem) as the square root of the sum of the squared elements of the vector. For example, consider the following data vector denoted $y$ :

$y = [\begin{array}{l} 35 \\ 36 \\ 46 \\ 68 \\ 70 \end{array}] .$

The norm of vector y is denoted $| | y | |$ and is computed as

$\begin{array}{l} | | y | | = \sqrt{35^{2} + 36^{2} + 46^{2} + 68^{2} + 70^{2}} \\ = \sqrt{14, 161} = 119. \end{array}$

Normalizing With the Norm

To normalize y, we divide each element by $| | y | | = 119.$ The normalized vector, denoted $ y$ , is equal to

$\tilde{y} = [\begin{array}{l} \frac{35}{119} \\ \frac{36}{119} \\ \frac{46}{119} \\ \frac{68}{119} \\ \frac{70}{119} \end{array}] = [\begin{array}{l} 0.2941 \\ 0.3025 \\ 0.3866 \\ 0.5714 \\ 0.5882 \end{array}] .$

The norm of vector $\tilde{y}$ is now equal to one:

$\begin{array}{l} | | \tilde{y} | | = \\ \sqrt{{0.2941}^{2} + {0.3025}^{2} + {0.3866}^{2} + {0.5714}^{2} + {0.5882}^{2}} \\ = \sqrt{1} = 1. \end{array}$

Normalization Using Centering and Standard Deviation: z Scores

The Standard Deviation of a Set of Scores

Recall that the standard deviation of a set of scores expresses the dispersion of the scores around their mean. A set of N scores, each denoted Yn, whose mean is equal to M, has a standard deviation denoted $\hat{S}$ , which is computed as

$\hat{S} = \frac{\sum {(Y_{N} - M)}^{2}}{N - 1} .$

For example, the scores from vector y (see Equation 4) have a mean of 51 and a standard deviation of

$\begin{array}{l} \hat{S} = \sqrt{\frac{{(35 - 51)}^{2} + {(36 - 51)}^{2} + {(46 - 51)}^{2} + {(68 - 51)}^{2} + {(70 - 51)}^{2}}{5 - 1}} \\ = \frac{1}{2} \sqrt{{(- 16)}^{2} + - 15^{2} + {(- 5)}^{2} + 17^{2} + 19^{2}} \\ = 17. \end{array}$

z Scores: Normalizing With the Standard Deviation

To normalize a set of scores using the standard deviation, we divide each score by the standard deviation of this set of scores. In this context, we almost always subtract the mean of the scores from each score prior to dividing by the standard deviation. This normalization is known as z scores. Formally, a set of N scores each denoted Yn and whose mean is equal to M and whose standard deviation is equal to $\hat{S}$ is transformed in z scores as

$z_{n} = \frac{Y_{n} - M}{\hat{S}} .$

With elementary algebraic manipulations, it can be shown that a set of z scores has a mean equal of zero and a standard deviation of one. Therefore, z scores constitute a unit-free measure that can be used to compare observations measured with different units.

Example

For example, the scores from vector y (see Equation 1) have a mean of 51 and a standard deviation of 17. These scores can be transformed into the vector z of z scores as

$z = [\begin{array}{l} \frac{35 - 51}{17} \\ \frac{36 - 51}{17} \\ \frac{46 - 51}{17} \\ \frac{68 - 51}{17} \\ \frac{70 - 51}{17} \end{array}] = [\begin{array}{l} - \frac{16}{17} \\ - \frac{15}{17} \\ - \frac{5}{17} \\ \frac{17}{17} \\ \frac{19}{17} \end{array}] = [\begin{array}{l} - 0.9412 \\ - 0.8824 \\ - 0.2941 \\ 1.0000 \\ 1.1176 \end{array}] .$

[Page 1096]The mean of vector z is now equal to zero, and its standard deviation is equal to one.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Normalizing Data

Normalization to a Norm of One

The Norm of a Vector

Normalizing With the Norm

Normalization Using Centering and Standard Deviation: z Scores

The Standard Deviation of a Set of Scores

z Scores: Normalizing With the Standard Deviation

Example

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends