Skip to main content icon/video/no-internet

Statistics is a branch of applied mathematics used to inform scientific decision-making in the absence of complete information about phenomena of interest. The application of statistics is integral to the theory and practice of epidemiology because it allows an investigator to both describe characteristics of exposure and disease in targeted populations and make logical inferences about these populations based on samples of observations. As we will discuss, descriptive statistics are estimates used to characterize or describe the nature of a population in terms of measured variables, while inferential statistics are used to answer questions by testing specific hypotheses. This entry provides a general overview of important statistical concepts, distinguishes the categories of descriptive and inferential statistics, and describes how both descriptive and inferential statistics can inform scientific inquiry.

Fundamental Concepts of Statistics

To understand different statistical techniques discussed in this chapter, a brief overview of key concepts is necessary.

Statistic

A statistic is a quantitative estimate of a population characteristic that is based on a sample of observations taken from that population. In many areas of scientific inquiry, it is difficult or impossible, due to time and resource constraints, to observe or survey the entire universe or target population of interest. Fortunately, statisticians have shown that with properly conducted random sampling, valid and suitable estimates (known as statistics) of population values (known as parameters) can serve as effective substitutes.

This holds true in large part because under conditions of well-formulated research design and random sampling, mathematical principles of probability can accurately estimate the probable degree of imprecision, or sampling error, around statistical estimates of population parameters. By estimating this degree of imprecision accurately, one can know how well a statistic may capture a characteristic of the population it targets.

Random Sampling

The importance of random sampling to the value and utility of statistical analysis cannot be understated. While a complete discussion of random sampling and its variants is beyond the scope of this chapter, put simply, random sampling implies that every member of the population of interest has an equal probability of being included in the sample of measurements. To the extent that this assumption is met in the process of data collection, statistical estimates will have desirable statistical properties. To the extent that this assumption is violated, bias is introduced into resulting statistics that may limit or completely invalidate the degree to which the statistics derived from the sample reflect the population parameters under investigation.

Random Variables

Participant characteristics measured in research studies such as patient gender, age, income, presence or absence of a disease, or disease stage are also known as random variables.A random variable is an observable phenomenon with a definable, exhaustive set of possible values. To understand a random variable, one needs to understand its associated level of measurement. There are essentially two types of random variables:

  • Qualitative random variables, which take on discrete, categorical values and include those that are nominally measured (i.e., exposed vs. not) and those that are ordinal measured (disease stage—Cancer I to IV);
  • Quantitative random variables (i.e., age, income), which take on values that are measured on a continuous and constant incremental scale. Patient age, for example, generally ranges between 0 and 100 years.

Frequency and Probability Distribution

Random variables take on measurable values with an observable frequency relative to the total number of observed elements. This relative frequency constitutes the probability of observing that value in the sample and is an estimate of the probability of that value in the population. Coin flips, for example, have two possible values (heads and tails) that occur with equal relative frequency (i.e., each with a probability of .5). The assortment of the relative frequencies of the possible values of a random variable is known as a probability distribution. Under conditions of random sampling, the probability distribution of a sample of observed values of a random variable has some important

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading