Skip to main content icon/video/no-internet

The general linear model (GLM) provides a general framework for a large set of models whose common goal is to explain or predict a quantitative dependent variable by a set of independent variables that can be categorical or quantitative. The GLM encompasses techniques such as Student's t test, simple and multiple linear regression, analysis of variance, and covariance analysis. The GLM is adequate only for fixed-effect models. In order to take into account random-effect models, the GLM needs to be extended and becomes the mixed-effect model.

Notations

Vectors are denoted with boldface lower-case letters (e.g., y), and matrices are denoted with boldface upper-case letters (e.g., X). The transpose of a matrix is denoted by the superscript, and the inverse of a matrix is denoted by the superscript 1. There are I observations. The values of a quantitative dependent variable describing the I observations are stored in an I by 1 vector denoted y. The values of the independent variables describing the I observations are stored in an I by K matrix denoted X. K is smaller than I, and X is assumed to have rank K (i.e., X is full rank on its columns). A quantitative independent variable can be directly stored in X, but a qualitative independent variable needs to be recoded with as many columns as there are degrees of freedom for this variable. Common coding schemes include dummy coding, effect coding, and contrast coding.

Core Equation

For the GLM, the values of the dependent variable are obtained as a linear combination of the values of the independent variables. The vectors for the coefficients of the linear combination are stored in a K by 1 vector denoted b. In general, the values of y cannot be perfectly obtained by a linear combination of the columns of X, and the difference between the actual and the predicted values is called the prediction error. The values of the error are stored in an I by 1 vector denoted e. Formally, the GLM is stated as

None

The predicted values are stored in an I by 1 vector denoted ŷ, and therefore, Equation 1 can be rewritten as

None

Putting together Equations 1 and 2 shows that

None

Additional Assumptions

The independent variables are assumed to be fixed variables (i.e., their values will not change for a replication of the experiment analyzed by the GLM, and they are measured without error). The error is interpreted as a random variable; in addition, the I components of the error are assumed to be independently and identically distributed (i.i.d.), and their distribution is assumed to be a normal distribution with a zero mean and a variance denoted σ2e. The values of the dependent variable are assumed to be a random sample of a population of interest. Within this framework, the vector b is seen as an estimation of the population parameter vector β.

Least Square Estimate

Under the assumptions of the GLM, the population parameter vector β is estimated by b, which is computed as

None

This value of b minimizes the residual sum of squares (i.e., b is such that eTe is minimum).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading