Mixtures of Experts

Neil J.Salkind

doi:10.4135/9781412952644

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Mixtures of Experts

Edited by:
Neil J. Salkind
In:Encyclopedia of Measurement and Statistics
Chapter DOI:https://doi.org/10.4135/9781412952644.n287
Subject:Quantitative/Statistical Research, Test & Measurement
Keywords:distribution; latent variables

Request Permissions

Show page numbers Hide page numbers

It is often the case that the analysis of the values of a set of random variables becomes simpler if one posits that these variables are related to another set of variables, called latent variables, whose values are unobserved. Consider, for example, the two-dimensional data in Figure 1. This data set is complicated in the sense that it is multimodal and, thus, cannot be summarized by a standard distribution such as those comprising the exponential family. A way of simplifying the analysis is by assuming that the distribution of these data is a combination of two simple distributions, namely, two Gaussian distributions. Each data item was generated as follows. First, a value for a latent variable was sampled from a Bernoulli distribution. Next, if the latent variable was set to 0, then the data item was sampled from the first Gaussian distribution [Page 620](e.g., with mean vector [3 3]T); if the latent variable was set to 1, then the data item was sampled from the second Gaussian distribution (with mean vector [7 7]T). Although the value of the latent variable is not observed, it is easy to use Bayes's rule to compute the distribution of the variable given the data item.

This model is a latent variable model known as a mixture model. Mixture models provide a principled way of combining two or more simple distributions (e.g., unimodal distributions such as Gaussian distributions) into a single complicated (e.g., multimodal) distribution. As this example illustrates, mixture models are “piecewise estimators” in the sense that different components are used to summarize different subsets of the data. The subsets do not, however, have hard boundaries; as discussed below, a data item might be a member of multiple subsets simultaneously.

Mixtures-of-experts (ME) models are an extension of mixture models. They differ from conventional mixture models in that their mixture components are conditional probability distributions. Consequently, they are suitable for summarizing data sets in which the distribution of output or response variables depends on the values of input or covariate variables. Such data sets arise in the context of regression or classification tasks, for example.

Figure 1 Two-Dimensional Data to Be Summarized

ME models perform tasks using a “divide and conquer” strategy—complex tasks are decomposed into simpler subtasks. ME models can be characterized as fitting piecewise models to the data. The data are assumed to form a countable set of paired variables X = {(x(t),y(t))}Tt=1, where x is a vector of explanatory variables, also referred to as covariates, and y is a vector of responses. ME models divide the covariate space, meaning the space of all possible values of the explanatory variables, into regions, and then they fit simple surfaces to the data that fall in each region. Unlike many other piecewise approximators, these models use regions that are not disjoint. The regions have “soft” boundaries, meaning that data points may lie simultaneously in multiple regions. In addition, the boundaries between regions are themselves simple parameterized surfaces whose parameter values are estimated from the data.

ME models combine properties of generalized linear models with those of mixture models. Like generalized linear models, they are used to model the relationship between a set of covariate and a set of response variables. Unlike standard generalized linear models, however, they assume that the conditional distribution of the responses (given the covariates) is a finite mixture distribution. Because ME models assume a finite mixture distribution, they provide a motivated alternative to nonparametric models and provide a richer class of distributions than standard generalized linear models.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Mixtures of Experts

Figure 1 Two-Dimensional Data to Be Summarized

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends