Skip to main content icon/video/no-internet

In his seminal work, William McDougall discussed how the meanings of “character” and “personality” can be analyzed into five distinguishable factors, including intellect, character, temperament, disposition, and temper. He maintained that each of these meanings is latent and comprises many variables. This can be translated into a statistical framework where one observes many variables that are explained by a small number (five in this case) of latent factors. McDougall’s intuition was an anticipation of the results of half a century of work to organize the language of personality into a coherent structure.

Latent (or not directly observable variables) abound in political science: public opinion, socioeconomic status, social capital, ideology, or utility. Instead of observing these quantities (called factors), researchers may have indicators of these concepts or observable measures related to these concepts. The indicators or variables are observed measures that relate to a latent concept with different degrees of strength. This entry introduces factor analysis (FA), paying specific attention to factor models and factor component analysis, rotation, and the application of FA in the social sciences.

Factor Models

The main idea behind a factor model is that a large number N of variables can be explained by a small number q of factors. In a factor model, a vector of N observations at time t—for example, YN(t)—is decomposed into a common component XN(t) and an idiosyncratic component ZN(t):

YN(t)=XN(t)+ZN(t).

The common components are a linear combination of the latent common factors f(t), with weights given by the so-called factor loadings LN:

XN(t)=LNf(t)=l1f1(t)+l2f2(t)+...+lqfq(t).

In this notation, the matrix LN is N × q, whereas the vector of factors is q × 1. Both X and Z are unobservable. The covariance matrix of the observations can be decomposed accordingly into

CY=CX+CZ=LNCfLNT+CZ.

In public debates, for example, the covariance matrix CX captures the correlations between the semantic concepts and the common factors, whereas the covariance matrix CZ explains the covariation between specific semantic meanings in the debates that cannot be explained by the factors.

Factor models are appealing for two reasons: they are a dimension-reduction tool and, at the same time, a meaningful representation of the principal components of the covariance matrix of the observations. While the first feature (dimension reduction) is common to all factor models, the second property is achieved only when the number of series, N, is large. The traditional or strict approach assumes that N is finite and that the covariance matrix of the errors CZ is diagonal (for identification, some other conditions need to be imposed on the loadings). Then, the parameters in the models can be estimated by maximum likelihood. The most recent literature of approximate factor models differs from the traditional one in that the covariance matrix of the idiosyncratic components CZ is allowed to be nondiagonal and the cross-section size N is large. Allowing for a nondiagonal CZ is important: It means that one allows for nonzero correlation among specific concepts in these debate analyses.

With sufficiently large N, the parameters in the model can be estimated by principal components. While FA is based on a statistical model, principal components analysis (PCA) is a tool for dimension reduction. More precisely, given a T × N data set Y= [Y1, . . . ,YN] with T observations and N variables, the principal components are the projections P = [P1, . . . ,Pq] of the T observations onto a subspace W = [w1, . . . ,wq] of the original

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading