Skip to main content icon/video/no-internet

The Bayesian approach to statistics is a general paradigm for drawing inferences from observed data. It is distinguished from other approaches by the use of probabilistic statements about fixed but unknown quantities of interest (as opposed to probabilistic statements about mechanistically random processes such as coin flips). At the heart of Bayesian analysis is Bayes's theorem, which describes how knowledge is updated on observing data.

In epidemiology, diagnostic testing provides the most familiar illustration of Bayes's theorem. Say the unobservable variable y is a subject's true disease status (coded as 0/1 for absence/presence). Let q be the investigator-assigned probability that y = 1, in advance of diagnostic testing. One interpretation of how this probability statement reflects knowledge is that the investigator perceives the pretest odds q=(1 − q) as the basis of a ‘fair bet’ on whether the subject is diseased. If the subject is randomly selected from a population with known disease prevalence, then setting q to be this prevalence is a natural choice. Now say a diagnostic test with known sensitivity SN (probability of positive test for a truly diseased subject) and specificity SP (probability of negative test for a truly undiseased subject) is applied. Let q ∗ denote the probability that y = 1 given the test result. The laws of probability, and Bayes's theorem in particular, dictate that the posttest disease odds, q∗ =(1 − q∗), equals the product of the pretest odds and the likelihood ratio (LR). The LR is the ratio of probabilities of the observed test result under the two possibilities for disease status, that is, LR= SN=(1 − SP) for a positive test, LR= (1 − SN)= SP for a negative test. Thus, postdata knowledge about y (as described by q∗)isan amalgam of predata knowledge (as described by q) and data (the test result).

More generally, any statistical problem can be cast in such terms, with y comprising all relevant unobservable quantities (often termed parameters). The choice of q in the testing problem generalizes to the choice of a prior distribution, that is, a probability distribution over possible values of y, selected to represent predata knowledge about y.A statistical model describes the distribution of data given the unobservables (e.g., SN and SP describe the test result given disease status). Bayes's theorem then produces the posterior distribution, that is, the distribution of y given the observed data, according to

None

for any two possible values a and b for y. Succinctly, a ratio of posterior probabilities is the product of the likelihood ratio and the corresponding ratio of prior probabilities.

The specification of prior distributions can be controversial. Sometimes, it is cited as a strength of the Bayesian approach, in that often predata knowledge is available, and should be factored into the analysis. Sometimes, though, the prior specification is seen as more subjective than is desirable for scientific pursuits. In many circumstances, prior distributions are specified to represent a lack of knowledge; for instance, without information on disease prevalence, one might set q= 0.5 in the diagnostic testing scenario above. Or, for a continuous parameter (an exposure prevalence, say), an investigator might assign a uniform prior distribution, to avoid favoring any particular prevalence values in advance of observing the data.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading