Skip to main content icon/video/no-internet

Covariance is a measure of association between two random variables. It has several applications in the design and analysis of surveys.

The covariance of two random variables, X and Y, is equal to the expected product of the deviations between the random variables and their means:

None

Under a design-based perspective to surveys, the sample inclusion indicators are random variables, and covariance is present when the probabilities of inclusion are correlated.

For a simple random sample of n units from a population of size N, the covariance between the means x and y is estimated as:

None

This is equivalent to the variance formula when xi and yi are the same for each unit in the sample. For complex sample surveys, standard variance estimation techniques, such as Taylor series linearization, balanced repeated replication, or jackknife replication, can be used to compute covariance.

Covariance can be written as a function of the correlation p(x,y):

None

where var(x) and var(y) are the variances of x and y, respectively. The covariance of x and y is equal to zero when x and y are uncorrelated, as is the case when they are derived from two independent samples or from independent strata within the same sample. However, in many situations in sample surveys, the covariance is present and should not be ignored.

For example, suppose a nonresponse bias analysis is conducted to determine the impact of a low response rate on survey estimates. The bias in an estimate is

None

where yR is the estimate based on only the respondents and y is the estimate from the entire sample. The variance of the bias is

None

In general, the variance of a linear combination of random variables, X1 through Xn, is

None

The percentage of females in the population is estimated as 48% based on only respondents but as 50% from the full sample, for a bias of −2%. Using the appropriate variance estimation method, the variances are found to be 1.2 for the estimate from respondents and 1.0 for the full sample, with a covariance of 0.9. Taking into consideration the correlation between estimates from the full sample and estimates from respondents only, the variance of the bias is 0.4 (= 1.2 + 1.0 − (2∗0.9)). Using a Mest to test the null hypothesis that the bias is equal to zero, the p-vahie is found to be < 0.001, indicating significant bias in the estimate of females. However, if the covariance term is ignored, the variance of the bias is calculated as 2.2, and the bias is no longer determined to be statistically significant.

Ignoring the covariance term leads to an over-estimation of the variance of the difference of the estimates, given the two estimates are positively correlated. This result is important in other survey contexts, such as comparing estimates between two time periods for a longitudinal survey or from different subdomains involving clustering. Covariance also has several other applications in surveys, including intraclass correlations, goodness-of-fit tests in a regression analysis, and interviewer effects.

WendyVan de Kerckhove

Further Readings

Cochran,

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading