Skip to main content icon/video/no-internet

Partial least squares (PLS) regression is a recent technique that generalizes and combines features from principal components analysis and multiple regression. It is particularly useful when we need to predict a set of dependent variables from a (very) large set of independent variables (i.e., predictors). It originated in the social sciences (specifically, economics; Wold, 1966) but became popular first in chemometrics, due in part to Wold's son Svante (see, e.g., Geladi & Kowalski, 1986), and in sensory evaluation (Martens & Naes, 1989). But PLS regression is also becoming a tool of choice in the social sciences as a multivariate technique for nonexperimental and experimental data alike (e.g., neuroimaging; see McIntosh, Bookstein, Haxby, & Grady, 1996). It was first presented as an algorithm akin to the power method (used for computing eigenvectors) but was rapidly interpreted in a statistical framework (Frank & Friedman, 1993; Helland, 1990; Höskuldsson, 1988; Tenenhaus, 1998).

Prerequisite Notions and Notations

The I observations described by K dependent variables are stored in an I × K matrix denoted Y, and the values of the J predictors collected on these Iobservations are collected in the I × J matrix X.

Goal

The goal of PLS regression is to predict Y from X and to describe their common structure. When Y is a vector and X is full rank, this goal could be accomplished using ordinary least squares (OLS). When the number of predictors is large compared to the number of observations, X is likely to be singular, and the regression approach is no longer feasible (i.e., because of multicollinearity). Several approaches have been developed to cope with this problem. One approach is to eliminate some predictors (e.g., using stepwise methods); another one, called principal component regression, is to perform a principal components analysis (PCA) of the X matrix and then use the principal components of X as regressors on Y.

The orthogonality of the principal components eliminates the multicollinearity problem. But the problem of choosing an optimum subset of predictors remains. A possible strategy is to keep only a few of the first components. But they are chosen to explain X rather than Y, and so nothing guarantees that the principal components, which “explain” X, are relevant for Y.

By contrast, PLS regression finds components from X that are also relevant for Y. Specifically, PLS regression searches for a set of components (called latent vectors) that perform a simultaneous decomposition of X and Y with the constraint that these components explain as much as possible of the covariance between X and Y. This step generalizes PCA. It is followed by a regression step in which the decomposition of X is used to predict Y.

Simultaneous Decomposition of Predictors and Dependent Variables

PLS regression decomposes both X and Y as a product of a common set of orthogonal factors and a set of specific loadings. So, the independent variables are decomposed as X = TPT with TTT = I, with I being the identity matrix (some variations of the technique do not require T to have unit norms). By analogy, with PCA, T is called the score matrix and P the loadingmatrix (in PLS regression, the loadings are not orthogonal). Likewise, Y is estimated as Ŷ = TBCT, where B is a diagonal matrix with the “regression weights” as diagonal elements (see below for more details on these weights). The columns of T are the latent vectors.When their number is equal to the rank of X, they perform an exact decomposition of X. Note, however, that they only estimateY (i.e., in general, Ŷ is not equal to Y).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading