Skip to main content icon/video/no-internet

Principal components analysis (PCA) is the workhorse of exploratory multivariate data analysis, especially in those cases when a researcher wants to gain an insight into and an overview of the relationships between a set of variables and evaluate individuals with respect to those variables. The basic technique is designed for continuous variables, but variants have been developed that cater for variables of categorical and ordinal MEASUREMENT LEVELS, as well as for sets of variables with mixtures of different types of measurement levels. In addition, the technique is used in conjunction with other techniques, such as REGRESSION ANALYSIS. In this entry, we will concentrate on standard PCA, but overviews of PCA at work in different contexts can, for instance, be found in the books by Joliffe (1986) and Jackson (1991), and an exposé of PCA for variables with different measurement levels is contained in Meulman and Heiser (2000).

Theory

Suppose that we have the scores of I individuals on J variables and that the relationships between the variables are such that no variable can be perfectly predicted by all the remaining variables. Then these variables form the axes of a J-dimensional space, and the scores of the individuals on these J variables can be portrayed in this J-dimensional space. However, looking at a high-dimensional space is not easy; moreover, most of the variability of the high-dimensional arrangement of the individuals can often be displayed in a low-dimensional space without much loss in variability. As an example, we see in Figure 1 that the two-dimensional ellipse A of scores of Sample A can be reasonably well represented in one dimension by the first principal component, and one only needs to interpret the variability along this single dimension. However, for the scores of Sample B, the one-dimensional representation is much worse (i.e., the variance accounted for by the first principal component is much lower in Case B than in Case A, and interpreting a single dimension might not suffice in Case B).

The coordinate axes of the low-dimensional dimensional space are commonly called components. If the components are such that they successively account for most of the variability in the data, they are called principal components. The coordinates of the individuals on the components are called component scores. To interpret components, the coordinates for the variables on these components need to be derived as well, and the common approach to do this is via EIGENVALUEEIGENVECTOR techniques. If both the variables and the components are standardized, the variable coordinates are the correlations between variables and components. By inspecting these correlations, commonly known as component loadings, one may assess the extent to which the components measure the same quantities as (groups of) variables. In particular, when a group of variables has high correlations with a component, the component has something in common with all of them, and on the basis of the substantive content of the variables, one may try to ascertain what the common element between the variables may be and hypothesize that the component is measuring this common element.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading