Principal Components Analysis

Neil J.Salkind

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Principal Components Analysis

Edited by:
Neil J. Salkind
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n334
Subject:Research Design
Keywords:principal component analysis

Request Permissions

Show page numbers Hide page numbers

Also known as empirical orthogonal function analysis, principal components analysis (PCA) is a multivariate data analysis technique that is employed to reduce the dimensionality of large data sets and simplify the representation of the data field under consideration. PCA is used to understand the interdependencies among variables and trim down the redundant (or significantly correlated) variables that are measuring the same construct. Data sets with a considerable proportion of interrelated variables are transformed into a set of new hypothetical variables known as principal components, which are uncorrelated or orthogonal to one another. These new variables are ordered so that the first few components retain most of the variation present in the original data matrix. The components reflect both common and unique variance of the variables (as opposed to common factor analysis that excludes unique variance), with the last few components identifying directions in which there is negligible variation or a near linear relationship with the original variables. Thus, PCA reduces the number of variables under examination and allows one to detect and recognize groups of interrelated variables. Frequently, PCA does not generate the final product and is often used in combination with other statistical techniques (e.g., cluster analysis) to uncover, model, and explain the leading multivariate relationships. The method was first introduced in 1901 by Karl Pearson and subsequently modified three decades later by Harold Hotelling for the objective of exploring correlation structures; it has since been used extensively in both the physical and social sciences.

Mathematical Origins and Matrix Constructs

PCA describes the variation in a set of multivariate data in terms of a new assemblage of variables that are uncorrelated to one another. Mathematically, the statistical method can be described briefly as a linear transformation from the original variables, x1;…;xp, to new variables, y1;…;yp(as described succinctly by Geoff Der and Brian Everitt), where

The coefficients, app,defining each new variable are selected in such a way that the yvariables or principal components are orthogonal, meaning that the coordinate axes are rotated such that the axes are still at right angles to each other while maximizing the variance. Each component is arranged according to decreasing order of variance accounted for in the original data matrix. The number of possible principal components is equal to the number of input variables, but not all components will be retained in the analysis seeing that a primary goal of PCA is simplification of the data matrix (see the subsequent section on principal component truncation methods).

The original coordinates of the zth data point, xij,j = 1,…, p,becomes in the new system (as explained by Trevor Bailey and Anthony Gatrell):

The jth new variable y7is normally referred to as the jth principal component, whereas yijis termed the score of the jth observation on the /th principal component. The relationship between the /th principal component and the kth original variable is described by the covariance between them, given as

where skkis the estimated variance of the th original variable or the th diagonal element of the data matrix S. This relationship is referred to as a loading of the th original variable of the /th principal component. Component loadings are essentially correlations between the variables and the component and are interpreted similarly to product-moment correlation coefficients (or Pearson's r). Values of components loadings range from 1.0 to 1.0. More positive (negative) component loadings indicate a stronger linkage of a variable on a particular component, and those values closer [Page 1099]to zero signify that the variable is not being represented by that component.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Principal Components Analysis

Mathematical Origins and Matrix Constructs

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends