Skip to main content icon/video/no-internet

Logic regression is an adaptive regression and classification tool to address problems arising when data of mostly binary covariates are analyzed and the interactions between these predictors are of main interest to predict future outcomes or to identify variables that are associated with a particular outcome. Logic regression should not be confused with logistic regression: For logistic regression, the response is binary; for logic regression, the covariates are binary, but the response and the regression model can have any form. Binary covariates arise in many medical settings, such as the diagnosis of disease using phenotypic features, the identification of factors that contribute to emergency room crises, and the identification of genotypes that are associated with a particular disease. Often, the interaction between those binary predictors is of particular interest. Given a set of binary covariates, logic regression creates new predictors by considering Boolean (“logic”) combinations of the binary covariates and has the capability to embed those into a regression framework. As an example, this allows for statements such as “the odds of suffering an adverse response in the emergency room are three times higher for subjects above 65 years of age who have high blood pressure or breathing problems.” The logic regression framework includes many forms of classification and regression (such as linear and logistic regression, the Cox proportional hazards model, and more). In general, any type of model can be considered, as long as an objective (scoring) function can be defined. The model search is carried out using simulated annealing, a stochastic search algorithm commonly used in high-dimensional data problems. Model selection is performed via cross-validation or permutation tests, which implicitly address multiple comparisons problems. A Markov chain Monte Carlo-based extension of logic regression to create ensembles of plausible covariate combinations and measures of variance importance has also been implemented. The logic regression software is freely available as a contributed package to the statistical environment, R, and can be downloaded from the Comprehensive R Archive Network.

Description

In many medical and public health-related settings, a number of binary variables are collected, and the aim is the prediction of a particular response or the selection of covariates associated with the response. The former includes, for example, the task to predict which incoming patient should be admitted to critical care from a set of medical markers and records and the determination of what conditions are responsible for emergency room crises. Genomic studies, in particular, single nucleotide polymorphism (SNP) association studies, are an instance of the latter. For example, researchers studied the relation between 88 SNPs and their association with restenosis development among 779 subjects, with the main question of interest being the search for a combination of SNPs that best explains the variation in the phenotype. In these settings, the interaction of several of those binary variables often predicts the outcome or explains the relationship of relevant covariates to the outcome better than the individual covariates alone. Thus, from a statistical perspective, this represents a very challenging task, since in a typical setting the number of possible interactions between the predictors can be immense.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading