Skip to main content icon/video/no-internet

A good understanding of degrees of freedom (df)is important in statistics, but most statistics textbooks do not really explain what it means. In most cases, degrees of freedom are thought of as a parameter used to define statistical distributions and conduct hypothesis tests. For instance, the sampling distribution for the t statistic is a continuous distribution called the t distribution. The shape of the t distribution depends on one parameter, the degrees of freedom. In a sample of size n, the t distribution has n − 1 df.

Degrees of freedom can be thought of in other ways also. The degrees of freedom indicate the number of independent pieces of information that are allowed to vary in a system. A simple example is given by imagining a four-legged table. When three of the legs are free to be any length, the fourth leg must be a specified length if the table is to stand steadily on the floor. Thus, the degrees of freedom for the table legs are three. Another example involves dividing a sample of n observations into k groups. When k − 1 cell counts are generated, the kth cell count is determined by the total number of observations. Therefore, there are k − 1 df in this design.

Generally, every time a statistic is estimated, 1 df is lost. A sample of n observations has ndf. A statistic calculated from that sample, such as the mean, also has ndf. The sample variance is given by the following equation:

None

where x¯ is the sample mean. The degrees of freedom for the sample variance are n − 1, because the number of independent pieces of information in the system that are allowed to vary is restricted. Since the sample mean is a fixed value—it cannot vary—1 df is lost. Another reason there are n − 1 degrees of freedom is that the sample variance is restricted by the condition that the sum of errors

None

is zero. When m linear functions of the sample data are held constant, there are nmdf.

We can look at degrees of freedom another way by referring to simple regression for an example. Often, we want to compare results from a regression model (the full model) with another model that includes fewer parameters and, therefore, has fewer degrees of freedom (the reduced model). The difference in degrees of freedom between the full and reduced models is the number of estimated parameters in the full model, p(f), minus the number of estimated parameters in the reduced model, p(r). The full regression model is

None

There are two parameters to be estimated in this model, p(f) = 2. The reduced model is constructed based on the null hypothesis that β1 is equal to zero. Therefore, the reduced model is

None

There is only one parameter to be estimated in this model, p(r) = 1. This means that there is p(f) − p(r) =1 piece of information that can be used for estimating the value of the full model over the reduced model. A test statistic used to compare the two models (the F-change statistic, for instance) will have 1 df.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading