Skip to main content icon/video/no-internet

The Bayesian information criterion (BIC) is a statistic used for comparison and selection of statistical models. BIC is given by a simple formula that uses only elements of standard output for fitted models. It is calculated for each model under consideration, and models with small values of BIC are then preferred for selection. The BIC formula and the sense in which the model with the smallest BIC is the “best” one are motivated by one approach to model selection in Bayesian statistical inference.

Definition

Suppose that we are analyzing a set of data D of size n. Here is the sample size if D consists of statistically independent observations and the “effective sample size” in some appropriate sense when the observations are not independent. Suppose that alternative models Mk are considered for D, and that each model is fully specified by a parameter vector θk with pk parameters. Let p(Dk;Mk) denote the likelihood function for model Mk, lk) = log p(Dk;Mk) the corresponding log-likelihood, and θ∘k the maximum likelihood estimate of θk.

Let Ms denote a saturated model that fits the data exactly. One form of the BIC statistic for a model Mk is

None

where

l(θ∘s) is the log-likelihood for the saturated model, G2k is the deviance statistic for model Mk, and

dfk is its degrees of freedom.

This version of BIC is most appropriate when the idea of a saturated model is natural, such as for models for contingency tables and structural equation models for covariance structures. The deviance and its degrees of freedom are then typically included in standard output for the fitted model. In other cases, other forms of BIC may be more convenient. These variants, all of which are equivalent for purposes of model comparison, are described at the end of this entry.

Motivation as an Approximate Bayes Factor

The theoretical motivation of BIC is based on the idea of a Bayes factor, which is a statistic used for comparison of models in Bayesian statistical analysis. First, define for model Mk the integrated likelihood

None

where pk | Mk) is the density function of a prior distribution specified for the parameters θk, and the integral is over the range of possible values for θk. Defining pk | Ms) similarly for the saturated model, the Bayes factor between models Ms and Mk is the ratio BFk = p(D | Ms)/p(D | Mk). It is a measure of the evidence provided by the data in favor of Ms over Mk. The evidence favors Ms if BFk is greater than 1 and Mk if BFk is less than 1.

BICk is an approximation of 2logBFk. The approximation is particularly accurate when each of the prior distributions pk | Mk) and ps| Ms) is a multivariate normal distribution with a variance matrix comparable to that of the sampling distribution of the maximum likelihood estimate of the parameters based on a hypothetical sample of size n = 1. An assumption of such prior distributions, which are known as unit information priors, thus implicitly underlies BIC Equation 1. Their motivation and the derivation of BIC are discussed in detail in the Further Reading list below.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading