Covariance

Paul J.Lavrakas

doi:10.4135/9781412963947

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Covariance

Edited by:
Paul J. Lavrakas
In:Encyclopedia of Survey Research Methods
Chapter DOI:https://doi.org/10.4135/9781412963947.n113
Subject:Survey Research

Request Permissions

Show page numbers Hide page numbers

Covariance is a measure of association between two random variables. It has several applications in the design and analysis of surveys.

The covariance of two random variables, X and Y, is equal to the expected product of the deviations between the random variables and their means:

Under a design-based perspective to surveys, the sample inclusion indicators are random variables, and covariance is present when the probabilities of inclusion are correlated.

For a simple random sample of n units from a population of size N, the covariance between the means x and y is estimated as:

This is equivalent to the variance formula when xi and yi are the same for each unit in the sample. For complex sample surveys, standard variance estimation techniques, such as Taylor series linearization, balanced repeated replication, or jackknife replication, can be used to compute covariance.

Covariance can be written as a function of the correlation p(x,y):

where var(x) and var(y) are the variances of x and y, respectively. The covariance of x and y is equal to zero when x and y are uncorrelated, as is the case when they are derived from two independent samples or from independent strata within the same sample. However, in many situations in sample surveys, the covariance is present and should not be ignored.

For example, suppose a nonresponse bias analysis is conducted to determine the impact of a low response rate on survey estimates. The bias in an estimate is

where yR is the estimate based on only the respondents and y is the estimate from the entire sample. The variance of the bias is

In general, the variance of a linear combination of random variables, X1 through Xn, is

The percentage of females in the population is estimated as 48% based on only respondents but as 50% from the full sample, for a bias of −2%. Using the appropriate variance estimation method, the variances are found to be 1.2 for the estimate from respondents and 1.0 for the full sample, with a covariance of 0.9. Taking into consideration the correlation between estimates from the full sample and estimates from respondents only, the variance of the bias is 0.4 (= 1.2 + 1.0 − (2∗0.9)). Using a Mest to test the null hypothesis that the bias is equal to zero, the p-vahie is found to be < 0.001, indicating significant bias in the estimate of females. However, if the covariance term is ignored, the variance of the bias is calculated as 2.2, and the bias is no longer determined to be statistically significant.

Ignoring the covariance term leads to an over-estimation of the variance of the difference of the estimates, given the two estimates are positively correlated. This result is important in other survey contexts, such as comparing estimates between two time periods for a longitudinal survey or from different subdomains involving clustering. Covariance also has several other applications in surveys, including intraclass correlations, goodness-of-fit tests in a regression analysis, and interviewer effects.

WendyVan de Kerckhove

http://dx.doi.org/10.4135/9781412963947.n113

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

Entry

Reader's guide

Entries A-Z

Subject index

Covariance

Further Readings

Sign in to access this content

Get a 30 day FREE TRIAL

Sage Recommends

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Covariance

Further Readings

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends