Skip to main content icon/video/no-internet
Discriminant analysis (DA) is a multivariate statistical method used for two purposes: separation of observations into two or more distinct groups and classification of new observations into known groups. In DA, the categorical variables or groups are the dependent variables and the responses are the independent variables, so it is the reverse of a multivariate analysis of variance (MANOVA). Since the two procedures are computationally similar, the same assumptions that apply to MANOVA also apply to DA. Briefly, the assumptions are that the data are normally distributed and the variance/covariance matrices are homogenous across groups. DA is also sensitive to the presence of outliers and multicollinearity among the independent variables.

Separation

DA is used as an exploratory procedure in research to gain a better understanding of reasons for observed differences between groups. For instance, a researcher may be interested in determining which body types are at greater risk for heart disease. The researcher records several measurements (such as height, weight, and cholesterol levels) or response variables on patients who have heart disease and those who do not. The researcher wants to determine how much each measurement contributes to the separation of or discrimination between the groups. The researcher may also wish to classify new patients into risk-level groups given their body measurements.
When DA is used for separation, new variables called canonical discriminant functions (CDF) are created that combine the existing response variables in such a way as to maximize the variation between groups. A CDF is a linear combination of the response variables of the form
None

where Si is the score for ith function, lin is the standardized coefficient for xn, n = 1, …, p. There is one CDF for each independent variable or one for the number of groups minus one, whichever number is the smaller of the two. For example, if there are three groups and five variables, two CDFs are generated. The first CDF explains the greatest percentage of variation between groups. Each successive CDF is independent of the previous function and explains less variance. The characteristic root or eigenvalue associated, with the CDF indicates the amount of variance explained by the function.

The magnitude of the CDF coefficients indicates how important each variable is to group discrimination relative to the other variables. However, these coefficients may be misleading in the presence of correlation between the responses, that is, if there is a high degree of collinearity. Just as in regression analysis, addition or deletion of a variable can have a large effect on the magnitude of the other coefficients. Furthermore, when two variables are correlated, their contribution may be split between them or one may have a large weight and the other a small weight.

The canonical structure matrix can also be interpreted. This matrix contains the correlations between the CDF scores and each individual variable. The larger the correlation, the more important the variable is to discrimination.

Classification

There are several ways to use DA for classification. One way is to use Fisher's linear classification functions (LCF) of the form

None

where j indicates the group, cjn is the unstandardized coefficient for the jth group and the nth variable, n = 1, 2, …, p, and cj0 is the constant for the jth group. There is one LCF for each group, and subjects are classified into the group with the highest discriminant score, Gj.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading