Skip to main content icon/video/no-internet

Effective Sample Size

Complex sample surveys rarely result in a set of independent and identically distributed observations, because of sample design features such as stratification, clustering, and unequal weighting that are necessary for efficient data collection. Such features affect the resulting variance of survey estimates. The effective sample size is one of several useful measures of the effect of the complex sample design on the resulting precision of the estimates.

A general definition of the effective sample size is the sample size for a simple random sample selected with replacement that yields the same variance for an estimate as the variance obtained from the sample design used to collect the data. A simple random sample selected with replacement yields a set of independent observations and is the simplest comparison sample design. It is immediately obvious that there is not a single effective sample size for any one study, since the variance for each outcome, analysis domain, and type of estimate (e.g. mean or regression coefficient) will be different. For example, the effective sample size, neff, of the mean is the sample size such that S2/neff = Var(None), where S2 is the population variance of the variable in question and Var(None) is the variance of the estimate under the sample design used to collect the data. Consequently, neff = S2/Var(None).

A related concept is the design effect (deff), which is the ratio of the variance under the sample design used to collect the data to the variance of a simple random sample selected with replacement of the same sample size. Assuming that the sampling fraction for the simple random sample is small, the design effect of the mean is DEFF = Var(None)/(S2/n), where n is the sample size from the sample design used to collect the data. Thus, we see that neg = n/DEFF. This latter expression is often used as the definition of the effective sample size. However, the definition presented herein more directly relates to the underlying concept of the effective sample size, whereas its relationship to the DEFF is a consequence of the concept.

To better understand the effective sample size, it is useful to consider the four major aspects of complex sample design that impact the variance of an estimate and hence the DEFF and neff.

  • Stratification. Stratification is the process of dividing the population into mutually exclusive and exhaustive groups and then selecting a separate independent sample from each stratum. When the observations within each stratum are more homogenous than those between the strata, the variance of the resulting estimate will be reduced. If the observations are approximately linearly related to the stratification variable, then the variance of the mean will be reduced by approximately Ds = (1 r2), where r is the correlation between the variable under study and the stratification variable.
  • Clustering. When clusters, or groups, of observations are selected together rather than single observations, the variance of an estimate is usually increased, since the observations within a cluster are most often positively correlated. In a two-stage sample design, where clusters are sampled first followed by individual observations within each cluster, the amount of increase in the variance of the estimated mean is approximately Dc = 1 + (m = l)py, where m is the number of observations selected per cluster from the analysis domain and p is the intracluster correlation between two observations in a cluster. This model assumes that the same number of observations is selected within each cluster and that there is a constant intracluster correlation within all clusters. For regression coefficients, the inflation, or possible deflation, in variance is approximately Dc = 1 + (m= l)pypx where py and px are the intracluster correlation coefficients for the dependent variable and the independent variable, respectively. For certain designs and regression models, it is possible for px to be negative, resulting in a decrease in the variance of the estimated coefficient.
  • Unequal weighting. When the sample is selected with unequal probabilities, the variance of the estimated mean is increased above that of an equal probability sample of the same sample size due to the variability in the weights unless the selection probabilities are approximately proportional to the values of the associated observations or otherwise optimally allocated to minimize the variance. The amount of this increase, often called the “effect of unequal weighting,” is approximately

None

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading