Skip to main content icon/video/no-internet

Normalizing Data

Researchers often want to compare scores or sets of scores obtained on different scales. For example, how do we compare a score of 85 in a cooking contest with a score of 100 on an IQ test? To do so, we need to “eliminate” the unit of measurement; this operation means to normalize the data. There are two main types of normalization. The first type of normalization originates from linear algebra and treats the data as a vector in a multidimensional space. In this context, to normalize the data is to transform the data vector into a new vector whose norm (i.e., length) is equal to one. The second type of normalization originates from statistics and eliminates the unit of measurement by transforming the data into new scores with a mean of 0 and a standard deviation of 1. These transformed scores are known as z scores.

Normalization to a Norm of One

The Norm of a Vector

In linear algebra, the norm of a vector measures its length, which is equal to the Euclidean distance of the endpoint of this vector to the origin of the vector space. This quantity is computed (from the Pythagorean theorem) as the square root of the sum of the squared elements of the vector. For example, consider the following data vector denoted y:

y=[3536466870].

The norm of vector y is denoted ||y|| and is computed as

||y||=352+362+462+682+702=14,161=119.

Normalizing With the Norm

To normalize y, we divide each element by ||y||=119. The normalized vector, denoted  y, is equal to

y˜=[35119 361194611968119 70119]=[0.29410.30250.38660.57140.5882].

The norm of vector y˜ is now equal to one:

||y˜||=0.29412+0.30252+0.38662+0.57142+0.58822=1=1.

Normalization Using Centering and Standard Deviation: z Scores

The Standard Deviation of a Set of Scores

Recall that the standard deviation of a set of scores expresses the dispersion of the scores around their mean. A set of N scores, each denoted Yn, whose mean is equal to M, has a standard deviation denoted S^, which is computed as

S^=(YNM)2N1.

For example, the scores from vector y (see Equation 4) have a mean of 51 and a standard deviation of

S^=(3551)2+(3651)2+(4651)2+(6851)2+(7051)251=12(16)2+152+(5)2+172+192=17.

z Scores: Normalizing With the Standard Deviation

To normalize a set of scores using the standard deviation, we divide each score by the standard deviation of this set of scores. In this context, we almost always subtract the mean of the scores from each score prior to dividing by the standard deviation. This normalization is known as z scores. Formally, a set of N scores each denoted Yn and whose mean is equal to M and whose standard deviation is equal to S^ is transformed in z scores as

zn=YnMS^.

With elementary algebraic manipulations, it can be shown that a set of z scores has a mean equal of zero and a standard deviation of one. Therefore, z scores constitute a unit-free measure that can be used to compare observations measured with different units.

Example

For example, the scores from vector y (see Equation 1) have a mean of 51 and a standard deviation of 17. These scores can be transformed into the vector z of z scores as

z=[355117365117465117685117705117]=[1617151751717171917]=[0.94120.88240.29411.00001.1176].

The mean of vector z is now equal to zero, and its standard deviation is equal to one.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading