Skip to main content icon/video/no-internet

Jackknife

The jackknife or “leave one out” procedure is a cross-validation technique first developed by M. H. Quenouille to estimate the bias of an estimator. John Tukey then expanded the use of the jackknife to include variance estimation and tailored the name of jackknife because like a jack-knife—a pocket knife akin to a Swiss army knife and typically used by Boy Scouts—this technique can be used as a “quick and dirty” replacement tool for a lot of more sophisticated and specific tools. Curiously, despite its remarkable influence on the statistical community, the seminal work of Tukey is available only from an abstract (which does not even mention the name of jackknife) and from an almost impossible to find unpublished note (although some of this note found its way into Tukey's complete work).

The jackknife estimation of a parameter is an iterative process. First the parameter is estimated from the whole sample. Then each element is, in turn, dropped from the sample and the parameter of interest is estimated from this smaller sample. This estimation is called a partial estimate (or also a jackknife replication). A pseudovalue is then computed as the difference between the whole sample estimate and the partial estimate. These pseudovalues reduce the (linear) bias of the partial estimate (because the bias is eliminated by the subtraction between the two estimates). The pseudovalues are then used in lieu of the original values to estimate the parameter of interest, and their standard deviation is used to estimate the parameter standard error, which can then be used for null hypothesis testing and for computing confidence intervals. The jackknife is strongly related to the bootstrap (i.e., the jackknife is often a linear approximation of the bootstrap), which is currently the main technique for computational estimation of population parameters.

As a potential source of confusion, a somewhat different (but related) method, also called jack-knife, is used to evaluate the quality of the prediction of computational models built to predict the value of dependent variable(s) from a set of independent variable(s). Such models can originate, for example, from neural networks, machine learning, genetic algorithms, statistical learning models, or any other multivariate analysis technique. These models typically use a very large number of parameters (frequently more parameters than observations) and are therefore highly prone to overfitting (i.e., to be able to predict the data perfectly within the sample because of the large number of parameters but to be able to predict new observations poorly). In general, these models are too complex to be analyzed by current analytical techniques, and therefore, the effect of overfitting is difficult to evaluate directly. The jackknife can be used to estimate the actual predictive power of such models by predicting the dependent variable values of each observation as if this observation were a new observation. To do so, the predicted value(s) of each observation is (are) obtained from the model built on the sample of observations minus the observation to be predicted. The jackknife, in this context, is a procedure that is used to obtain an unbiased prediction (i.e., a random effect) and to minimize the risk of overfitting.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading