Influence Statistics

Neil J.Salkind

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Influence Statistics

Edited by:
Neil J. Salkind
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n186
Subject:Research Design
Keywords:statistics

Request Permissions

Show page numbers Hide page numbers

Influence statistics measure the effects of individual data points or groups of data points on a statistical analysis. The effect of individual data points on an analysis can be profound, and so the detection of unusual or aberrant data points is an important part of nearly every analysis. Influence statistics typically focus on a particular aspect of a model fit or data analysis and attempt to quantify how the model changes with respect to that aspect when a particular data point or group of data points is included in the analysis. In the context of linear regression, where the ideas were first popularized in the 1970s, a variety of influence measures have been proposed to assess the impact of particular data points.

The popularity of influence statistics soared in the 1970s because of the proliferation of fast and relatively cheap computing, a phenomenon that allowed the easy examination of the effects of individual data points on an analysis for even relatively large data sets. Seminal works by R. Dennis Cook; David A. Belsley, Edwin Kuh, and Roy E. Welsch; and R. Dennis Cook and Sanford Weisberg led the way for an avalanche of new [Page 597]techniques for assessing influence. Along with these new techniques came an array of names for them: DFFITS, DFBETAS, COVRATIO, Cook's D, and leverage, to name but a few of the more prominent examples. Each measure was designed to assess the influence of a data point on a particular aspect of the model fit: DFFITS on the fitted values from the model, DFBETAS on each individual regression coefficient, COVRATIO on the estimated residual standard error, and so on. Each measure can be readily computed using widely available statistical packages, and their use as part of an exploratory analysis of data is very common.

This entry first discusses types of influence statistics. Then we describe the calculation and limitations of influence statistics. Finally, we conclude with an example.

Types

Influence measures are typically categorized by the aspect of the model to which they are targeted. Some commonly used influence statistics in the context of linear regression models are discussed and summarized next. Analogs are also available for generalized linear models and for other more complex models, although these are not described in this entry.

Influence with respect to fitted values of a model can be assessed using a measure called DFFITS, a scaled difference between the fitted values for the models fit with and without each individual respective data point:

where the notation in the numerator denotes fitted values for the response for models fit with and without the ith data point, respectively, MSE(i) is the mean square for error in the model fit without data point i, and hii is the ith leverage; that is, the ith diagonal element of the hat matrix, H = X (XTX)-1 XT. Although DFFITS resembles a t statistic, it does not have a t distribution, and the size of DFFITSi is judged relative to a cutoff proposed by Belsley, Kuh, and Welsch. A point is regarded as potentially influential with respect to fitted values if |DFFITSi| > 2-√p/n, where n is the sample size and p is the number of estimated regression coefficients.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Influence Statistics

Types

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends