Skip to main content icon/video/no-internet

Maximum likelihood is a general method for estimating parameters in a statistical model. Given a known probability distribution Y with known probability density function (pdf), assume that we have a random sample, y1, …, yn from Y, where θ is an unknown population parameter associated with Y. The likelihood function L(θ) is the product of the pdf for each value evaluated on the n sample points:

None

Maximum likelihood chooses the estimate of the parameter θ that maximizes the likelihood of the observed data. Joint pdfs and likelihoods appear to be quite similar, but the two differ in an important respect. A joint pdf is a function of the data where the parameter is assumed to be known, while the likelihood is assumed to be a function of the unknown parameter θ and not the data. The value of θ that maximizes the likelihood function is the maximum likelihood estimate for θ. Common estimators such as ordinary least squares and the sample mean and proportion are in fact maximum likelihood estimators. Maximum likelihood possesses a number of desirable properties that account for its widespread use in statistical estimation. Maximum likelihood is widely used in political science to estimate logit and probit models, count models, and event history or survival models among others. In this entry, the origins, properties, and possible applications of this method are discussed.

Origins

R. A. Fisher invented the method of maximum likelihood in a series of papers published early in the 20th century. Fisher's work on maximum likelihood began with his derivation of the principle of “absolute criterion” in a paper he published as a third-year undergraduate. While this paper contains the origins of maximum likelihood estimation (MLE), there is little in the paper that many readers would recognize as MLE. In later papers, he developed the concept of likelihood as distinct from probability. Then, in 1922, Fisher united several earlier streams of his research and was the first to use the term maximum likelihood for a class of estimators as an alternative to Bayesian or method of moments estimators. In the same paper, Fisher proposes that maximum likelihood estimators have properties of efficiency, sufficiency, and consistency. Later work by other statisticians would establish the properties of MLE more rigorously.

A simple example is helpful for understanding the principles of MLE. Let us say we wish to estimate the sample proportion for a set of data. Assume we have a random sample of data y1, …, yn with n observations randomly drawn from a binomial distribution with common parameter p, where 0 < <p < 1 and each <y is either 1 for success or 0 for failure. For these n independent and identically distributed variables y1, …, yn, the density of each observation is

None

We next write the likelihood function, which is the density evaluated at the data as a function of the parameter p. However, because the binomial coefficient does not depend on the parameter of interest, p, we can omit it from our derivation of the maximum likelihood estimator. The likelihood function is the product of the individual densities for each observed data

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading