Skip to main content icon/video/no-internet

Discriminant analysis is a multivariate statistical technique that can be used to predict group membership from a set of predictor variables. The goal of discriminant analysis is to find optimal combinations of predictor variables, called discriminant functions, to maximally separate previously defined groups and make the best possible predictions about group membership. Discriminant analysis has become a valuable tool in social sciences as discriminant functions provide a means to classify a case into the group that it mostly resembles and help investigators understand the nature of differences between groups. For example, a college admissions officer might be interested in predicting whether an applicant, if admitted, is more likely to succeed (graduate from the college) or fail (drop out or fail) based on a set of predictor variables such as high school grade point average, scores on the Scholastic Aptitude Test, age, and so forth. A sample of students whose college outcomes are known can be used to create a discriminant function by finding a linear combination of predictor variables that best separates Groups 1 (students who succeed) and 2 (students who fail). This discriminant function can be used to predict the college outcome of a new applicant whose actual group membership is unknown. In addition, discriminant functions can be used to study the nature of group differences by examining which predictor variables best predict group membership. For example, which variables are the most powerful predictors of group membership? Or what pattern of scores on the predictor variables best describes the differences between groups? This entry discusses the data considerations involved in discriminant analysis, the derivation and interpretation of discriminant functions, and the process of classifying a case into a group.

Data Considerations of Discriminant Analysis

First of all, the predictor variables used to create discriminant functions must be measured at the interval or ratio level of measurement. The shape of the distribution of each predictor variable should correspond to a univariate normal distribution. That is, the frequency distribution of each predictor variable should be approximately bell shaped. In addition, multivariate normality of predictor variables is assumed in testing the significance of discriminant functions and calculating probabilities of group membership. The assumption of multivariate normality is met when each variable has a univariate normal distribution at any fixed values of all other variables. Although the assumption of multivariate normality is complicated, discriminant analysis is found to be relatively robust with respect to the failure to meet the assumption if the violation is not caused by outliers. Discriminant analysis is very sensitive to the inclusion of outliers. Therefore, outliers must be removed or transformed before data are analyzed.

Another assumption of discriminant analysis is that no predictor variable may be expressed as a linear combination of other predictor variables. This requirement intuitively makes sense because when a predictor variable can be represented by other variables, the variable does not add any new information and can be considered redundant. Mathematically, such redundancy can lead to unreliable matrix inversions, which result in large standard errors of estimates. Therefore, redundant predictor variables must be excluded from the analysis.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading