Akaike Information Criterion

Neil J.Salkind

doi:10.4135/9781412952644

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Akaike Information Criterion

Edited by:
Neil J. Salkind
In:Encyclopedia of Measurement and Statistics
Chapter DOI:https://doi.org/10.4135/9781412952644.n9
Subject:Quantitative/Statistical Research, Test & Measurement

Request Permissions

Show page numbers Hide page numbers

In statistical modeling, one of the main challenges is to select a suitable model from a candidate family to characterize the underlying data. Model selection criteria provide a useful tool in this regard. A selection criterion assesses whether a fitted model offers an optimal balance between goodness-of-fit and parsimony. Ideally, a criterion will identify candidate models that are either too simplistic to accommodate the data or unnecessarily complex.

The Akaike information criterion (AIC) was the first model selection criterion to gain widespread acceptance. AIC was introduced in 1973 by Hirotogu Akaike as an extension to the maximum likelihood principle. Conventionally, maximum likelihood is applied to estimate the parameters of a model once the structure of the model has been specified. Akaike's seminal idea was to combine estimation and structural determination into a single procedure.

The minimum AIC procedure is employed as follows. Given a family of candidate models of various structures, each model is fit to the data via maximum likelihood. An AIC is computed based on each model fit. The fitted candidate model corresponding to the minimum value of AIC is then selected.

AIC serves as an estimator of Kullback's directed divergence between the generating, or “true,” model (i.e., the model that presumably gave rise to the data) and a fitted candidate model. The directed divergence assesses the disparity or separation between two statistical models. Thus, when entertaining a family of fitted candidate models, by selecting the fitted model corresponding to the minimum value of AIC, one is hoping to identify the fitted model that is “closest” to the generating model.

Definition of AIC

Consider a candidate family of models denoted as M1, M2,…, ML. Let θk (k = 1,2,…, L) denote the parameter vector for model Mk, and let dk denote the dimension of model Mk, that is, the number of functionally independent parameters in θk. Let L(θk | y) denote the likelihood for θk based on the data y, and let θ∘k denote the maximum likelihood estimate of θk. The AIC for model Mk is defined as

The first term in AICk, −2logL(θ∘k|y), is based on the empirical likelihood L(θ∘k|y). This term, called the goodness-of-fit term, will decrease as the conformity of the fitted model Mk to the data improves. The second term in AICk, called the penalty term, will increase in accordance with the complexity of the model Mk. Models that are too simplistic to accommodate the data are associated with large values of the goodness-of-fit term, whereas models that are unnecessarily complex are associated with large values of [Page 16]the penalty term. In principle, the fitted candidate model corresponding to the minimum value of AIC should provide an optimal tradeoff between fidelity to the data and parsimony.

The Assumptions Underlying the use of AIC

AIC is applicable in a broad array of modeling frameworks because its justification requires only conventional large-sample properties of maximum likelihood estimators. However, if the sample size n is small in relation to the model dimension dk (e.g., dk ≈ n/2), AICk will be characterized by a large negative bias. As a result, AICk will tend to underestimate the directed divergence between the generating model and the fitted candidate model Mk. This underestimation is potentially problematic in applications in which the sample size is small relative to the dimensions of the larger models in the candidate family. In such settings, AIC may often select a larger model even though the model may be unnecessarily complex and provide a poor description of the underlying phenomenon. Small-sample variants of AIC have been developed to adjust for the negative bias of AIC. The most popular is the “corrected” AIC (AICc), which was first proposed in 1978 for the framework of normal linear regression by Nariaki Sugiura. A decade later, AICc was generalized, advanced, and popularized in a series of papers by Clifford M. Hurvich and Chih-Ling Tsai.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Akaike Information Criterion

Definition of AIC

The Assumptions Underlying the use of AIC

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends