Regression

Sarah Boslaugh

doi:10.4135/9781412953948

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Regression

Edited by:
Sarah Boslaugh
In:Encyclopedia of Epidemiology
Chapter DOI:https://doi.org/10.4135/9781412953948.n396
Subject:Epidemiology & Biostatistics, Public Health (general), Public Health Research Methods
Keywords:epidemiologic studies; research questions

Request Permissions

Show page numbers Hide page numbers

Many analyses of epidemiologic data are conducted using statistical methods common to other research [Page 899]fields, including the social and biologic sciences. In epidemiologic studies, however, the focus of the research question, and thus the methods used to study this question, tend to differ. The combination of human, social, environmental, and biological factors that may be present in epidemiologic studies can lead to a complexity not seen in large randomized trials or when the environment can be controlled. This entry discusses regression methods used in epidemiology and the conceptual framework that underlies these methods.

Predictive Analysis versus Associative Analysis

Epidemiologic research questions tend to fall into two general categories: (1) What factors best explain or predict the occurrence of another factor (or outcome)? (2) What is the association between exposure(s) and outcome(s)? The analytic methods used for these research questions are known as predictive or associative, respectively.

Regression analysis is used for both analysis of prediction and association. However, the selection of factors included in the model differs based on the type of analysis performed. While most statistics books focus on measures of prediction, epidemiologic studies are primarily concerned with questions of association. Because measuring associations is the most common use of regression analysis in epidemiology, this methodology is the focus of this entry. Analysis for questions of prediction will primarily be discussed to provide a contrast on how analysis differs from that for estimating measures of association.

Prediction

Research questions focused on prediction take two main forms. They may seek to identify any factors that may influence the detection of a health outcome, or they may seek to identify which of the factors are most predictive of development of the health outcome in affected individuals.

An example of the first use of predictive analysis is the identification of victims of intimate partner violence. A study may be done to identify indicators of partner violence victimization for women seen in a primary care setting. In such a study, a group of factors is found to identify victims. These include injuries, multiple nonspecific physical symptoms (e.g., pain, fatigue, headaches, diarrhea), and psychiatric diseases (e.g., depression, anxiety, post-traumatic stress disorder) as well as characteristics of the victim (e.g., young), perpetrator (e.g., young, excessive alcohol use), and relationship (e.g., wife makes more money than husband). Many of these factors are common among women seen in primary care (e.g., young age, depression). The more characteristics a woman has that were identified as predictive in regression, the more likely that she is a victim of partner violence. This analysis had no interest in identifying the ‘best’ predictor, but in understanding what factors, alone or in combination, predict partner violence so that these factors can be communicated to both physicians and patients. Some of the factors so identified may be outcomes rather than causes of partner violence, but this is not a concern when the purpose of the study is to identify potential victims of partner violence rather than make causal statements about it.

The goal of the second type of predictive model is to identify the most important factors that predict the outcome. For example, it is known that infection with Hepatitis C virus is a risk factor for development of liver cancer. However, not all individuals who are infected with Hepatitis C virus develop this cancer. Now that we can test for Hepatitis C, it may help understand what factors best predict liver cancer among those infected. It is then necessary to examine these other factors, which may include coinfection with Hepatitis B virus, gender, viral genotype, liver enzyme level, use of alcohol or tobacco, and environmental and occupational exposures. Using predictive modeling, we can identify which factors best predict development of liver cancer and then monitor the group with these characteristics more carefully to identify early disease and focus care to more aggressively reduce risk of liver cancer. Here, we are interested in identifying factors that are predictive of liver cancer, not how the factors are associated with or cause cancer.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Regression

Predictive Analysis versus Associative Analysis

Prediction

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends