Entry
Reader's guide
Entries A-Z
Subject index
Effective Sample Size
Complex sample surveys rarely result in a set of independent and identically distributed observations, because of sample design features such as stratification, clustering, and unequal weighting that are necessary for efficient data collection. Such features affect the resulting variance of survey estimates. The effective sample size is one of several useful measures of the effect of the complex sample design on the resulting precision of the estimates.
A general definition of the effective sample size is the sample size for a simple random sample selected with replacement that yields the same variance for an estimate as the variance obtained from the sample design used to collect the data. A simple random sample selected with replacement yields a set of independent observations and is the simplest comparison sample design. It is immediately obvious that there is not a single effective sample size for any one study, since the variance for each outcome, analysis domain, and type of estimate (e.g. mean or regression coefficient) will be different. For example, the effective sample size, neff, of the mean is the sample size such that S2/neff = Var(
), where S2 is the population variance of the variable in question and Var(
) is the variance of the estimate under the sample design used to collect the data. Consequently, neff = S2/Var(
).
A related concept is the design effect (deff), which is the ratio of the variance under the sample design used to collect the data to the variance of a simple random sample selected with replacement of the same sample size. Assuming that the sampling fraction for the simple random sample is small, the design effect of the mean is DEFF = Var(
)/(S2/n), where n is the sample size from the sample design used to collect the data. Thus, we see that neg = n/DEFF. This latter expression is often used as the definition of the effective sample size. However, the definition presented herein more directly relates to the underlying concept of the effective sample size, whereas its relationship to the DEFF is a consequence of the concept.
To better understand the effective sample size, it is useful to consider the four major aspects of complex sample design that impact the variance of an estimate and hence the DEFF and neff.
- Stratification. Stratification is the process of dividing the population into mutually exclusive and exhaustive groups and then selecting a separate independent sample from each stratum. When the observations within each stratum are more homogenous than those between the strata, the variance of the resulting estimate will be reduced. If the observations are approximately linearly related to the stratification variable, then the variance of the mean will be reduced by approximately Ds = (1 r2), where r is the correlation between the variable under study and the stratification variable.
- Clustering. When clusters, or groups, of observations are selected together rather than single observations, the variance of an estimate is usually increased, since the observations within a cluster are most often positively correlated. In a two-stage sample design, where clusters are sampled first followed by individual observations within each cluster, the amount of increase in the variance of the estimated mean is approximately Dc = 1 + (m = l)py, where m is the number of observations selected per cluster from the analysis domain and p is the intracluster correlation between two observations in a cluster. This model assumes that the same number of observations is selected within each cluster and that there is a constant intracluster correlation within all clusters. For regression coefficients, the inflation, or possible deflation, in variance is approximately Dc = 1 + (m= l)pypx where py and px are the intracluster correlation coefficients for the dependent variable and the independent variable, respectively. For certain designs and regression models, it is possible for px to be negative, resulting in a decrease in the variance of the estimated coefficient.
- Unequal weighting. When the sample is selected with unequal probabilities, the variance of the estimated mean is increased above that of an equal probability sample of the same sample size due to the variability in the weights unless the selection probabilities are approximately proportional to the values of the associated observations or otherwise optimally allocated to minimize the variance. The amount of this increase, often called the “effect of unequal weighting,” is approximately

...
- Ethical Issues in Survey Research
- Anonymity
- Beneficence
- Cell Suppression
- Certificate of Confidentiality
- Common Rule
- Confidentiality
- Consent Form
- Debriefing
- Deception
- Disclosure
- Disclosure Limitation
- Ethical Principles
- Falsification
- Informed Consent
- Institutional Review Board (IRB)
- Minimal Risk
- Perturbation Methods
- Privacy
- Protection of Human Subjects
- Respondent Debriefing
- Survey Ethics
- Voluntary Participation
- Measurement - Interviewer
- Measurement - Mode
- Measurement - Questionnaire
- Aided Recall
- Aided Recognition
- Attitude Measurement
- Attitude Strength
- Attitudes
- Aural Communication
- Balanced Question
- Behavioral Question
- Bipolar Scale
- Bogus Question
- Bounding
- Branching
- Check All that Apply
- Closed-Ended Question
- Codebook
- Cognitive Interviewing
- Construct
- Construct Validity
- Context Effect
- Contingency Question
- Demographic Measure
- Dependent Variable
- Diary
- Don't Knows (DKs)
- Double Negative
- Double-Barreled Question
- Drop-down Menus
- Event History Calendar
- Exhaustive
- Factorial Survey Method (Rossi's Method)
- Feeling Thermometer
- Forced Choice
- Gestalt Psychology
- Graphical Language
- Guttman Scale
- HTML Boxes
- Item Order Randomization
- Item Response Theory
- Knowledge Question
- Language Translations
- Likert Scale
- List-Experiment Technique
- Mail Questionnaire
- Mutually Exclusive
- Open-Ended Question
- Paired Comparison Technique
- Precoded Question
- Priming
- Psychographic Measure
- Question Order Effects
- Question Stem
- Questionnaire
- Questionnaire Design
- Questionnaire Length
- Questionnaire-Related Error
- Radio Buttons
- Random Order
- Random Start
- Randomized Response
- Ranking
- Rating
- Reference Period
- Response Alternatives
- Response Order Effects
- Self-Administered Questionnaire
- Self-Reported Measure
- Semantic Differential Technique
- Sensitive Topics
- Show Card
- Step-Ladder Question
- True Value
- Unaided Recall
- Unbalanced Question
- Unfolding Question
- Vignette Question
- Visual Communication
- Measurement - Respondent
- Acquiescence Response Bias
- Behavior Coding
- Cognitive Aspects of Survey Methodology (CASM)
- Comprehension
- Encoding
- Extreme Response Style
- Key Informant
- Misreporting
- Nonattitude
- Nondifferentiation
- Overreporting
- Panel Conditioning
- Panel Fatigue
- Positivity Bias
- Primacy Effect
- Reactivity
- Recency Effect
- Record Check
- Respondent
- Respondent Burden
- Respondent Fatigue
- Respondent-Related Error
- Response
- Response Bias
- Response Latency
- Retrieval
- Reverse Record Check
- Satisficing
- Social Desirability
- Telescoping
- Underreporting
- Measurement - Miscellaneous
- Nonresponse - Item-Level
- Nonresponse - Outcome Codes and Rates
- Busies
- Completed Interview
- Completion Rate
- Contact Rate
- Contactability
- Contacts
- Cooperation Rate
- e
- Fast Busy
- Final Dispositions
- Hang-up during Introduction (HUDI)
- Household Refusal
- Ineligible
- Language Barrier
- Noncontact Rate
- Noncontacts
- Noncooperation Rate
- Nonresidential
- Nonresponse Rates
- Number Changed
- Out of Order
- Out of Sample
- Partial Completion
- Refusal
- Refusal Rate
- Respondent Refusal
- Response Rates
- Standard Definitions
- Temporary Dispositions
- Unable to Participate
- Unavailable Respondent
- Unknown Eligibility
- Unlisted Household
- Nonresponse - Unit-Level
- Advance Contact
- Attrition
- Contingent Incentives
- Controlled Access
- Cooperation
- Differential Attrition
- Differential Nonresponse
- Economic Exchange Theory
- Fallback Statements
- Gatekeeper
- Ignorable Nonresponse
- Incentives
- Introduction
- Leverage-Saliency Theory
- Noncontingent Incentives
- Nonignorable Nonresponse
- Nonresponse
- Nonresponse Bias
- Nonresponse Error
- Refusal Avoidance
- Refusal Avoidance Training (RAT)
- Refusal Conversion
- Refusal Report Form (RRF)
- Response Propensity
- Saliency
- Social Exchange Theory
- Social Isolation
- Tailoring
- Total Design Method (TDM)
- Unit Nonresponse
- Operations - General
- Advance Letter
- Bilingual Interviewing
- Case
- Data Management
- Dispositions
- Field Director
- Field Period
- Mode of Data Collection
- Multi-Level Integrated Database Approach (MIDA)
- Paper-and-Pencil Interviewing (PAPI)
- Paradata
- Quality Control
- Recontact
- Reinterview
- Research Management
- Sample Management
- Sample Replicates
- Supervisor
- Survey Costs
- Technology-Based Training
- Validation
- Verification
- Video Computer-Assisted Self-Interviewing (VCASI)
- Operations - In-Person Surveys
- Operations - Interviewer-Administered Surveys
- Operations - Mall Surveys
- Operations - Telephone Surveys
- Access Lines
- Answering Machine Messages
- Call Forwarding
- Call Screening
- Call Sheet
- Callbacks
- Caller ID
- Calling Rules
- Cold Call
- Computer-Assisted Telephone Interviewing (CATI)
- Do-Not-Call (DNC) Registries
- Federal Communications Commission (FCC) Regulations
- Federal Trade Commission (FTC) Regulations
- Hit Rate
- Inbound Calling
- Interactive Voice Response (IVR)
- Listed Number
- Matched Number
- Nontelephone Household
- Number Portability
- Number Verification
- Outbound Calling
- Predictive Dialing
- Prefix
- Privacy Manager
- Research Call Center
- Reverse Directory
- Suffix Banks
- Supervisor-to-interviewer Ratio
- Telephone Consumer Protection Act 1991
- Telephone Penetration
- Telephone Surveys
- Touchtone Data Entry
- Unmatched Number
- Unpublished Number
- Videophone Interviewing
- Voice over Internet Protocol (VoIP) and the Virtual Computer-Assisted Telephone Interview (CATI) Facility
- Political and Election Polling
- 800 Poll
- 900 Poll
- ABC News/Washington Post Poll
- Approval Ratings
- Bandwagon and Underdog Effects
- Call-in Polls
- Computerized-Response Audience Polling (CRAP)
- Convention Bounce
- Deliberative Poll
- Election Night Projections
- Election Polls
- Exit Polls
- Favorability Ratings
- FRUGing
- Horse Race Journalism
- Leaning Voters
- Likely Voter
- Media Polls
- Methods Box
- National Council on Public Polls (NCPP)
- National Election Pool (NEP)
- National Election Studies (NES)
- New York Times/CBS News Poll
- Poll
- Polling Review Board (PRB)
- Pollster
- Pre-Election Polls
- Pre-Primary Polls
- Precision Journalism
- Prior Restraint
- Probable Electorate
- Pseudo-Polls
- Push Polls
- Rolling Averages
- Sample Precinct
- Self-Selected Listener Opinion Poll (SLOP)
- Straw Polls
- Subgroup Analysis
- SUGing
- Tracking Polls
- Trend Analysis
- Trial Heat Question
- Undecided Voters
- Public Opinion
- Agenda Setting
- Consumer Sentiment Index
- Issue Definition (Framing)
- Knowledge Gap
- Mass Beliefs
- Opinion Norms
- Opinion Question
- Opinions
- Perception Question
- Political Knowledge
- Public Opinion
- Public Opinion Research
- Quality of Life Indicators
- Question Wording as Discourse Indicators
- Social Capital
- Spiral of Silence
- Third-Person Effect
- Topic Saliency
- Trust in Government
- Sampling, Coverage, and Weighting
- Adaptive Sampling
- Add-a-Digit Sampling
- Address-Based Sampling
- Area Frame
- Area Probability Sample
- Capture-Recapture Sampling
- Cell Phone Only Household
- Cell Phone Sampling
- Census
- Cluster Sample
- Clustering
- Complex Sample Surveys
- Convenience Sampling
- Coverage
- Coverage Error
- Cross-Sectional Survey Design
- Cutoff Sampling
- Designated Respondent
- Directory Sampling
- Disproportionate Allocation to Strata
- Dual-Frame Sampling
- Duplication
- Elements
- Eligibility
- Email Survey
- EPSEM Sample
- Equal Probability of Selection
- Error of Nonobservation
- Errors of Commission
- Errors of Omission
- Establishment Survey
- External Validity
- Field Survey
- Finite Population
- Frame
- Geographic Screening
- Hagan and Collier Selection Method
- Half-Open Interval
- Informant
- Internet Pop-up Polls
- Internet Surveys
- Interpenetrated Design
- Inverse Sampling
- Kish Selection Method
- Last-Birthday Selection
- List Sampling
- List-Assisted Sampling
- Log-in Polls
- Longitudinal Studies
- Mail Survey
- Mall Intercept Survey
- Mitofsky-Waksberg Sampling
- Mixed-Mode
- Multi-Mode Surveys
- Multi-Stage Sample
- Multiple-Frame Sampling
- Multiplicity Sampling
- n
- N
- Network Sampling
- Neyman Allocation
- Noncoverage
- Nonprobability Sampling
- Nonsampling Error
- Optimal Allocation
- Overcoverage
- Panel
- Panel Survey
- Population
- Population of Inference
- Population of Interest
- Post-Stratification
- Primary Sampling Unit (PSU)
- Probability of Selection
- Probability Proportional to Size (PPS) Sampling
- Probability Sample
- Propensity Scores
- Propensity-Weighted Web Survey
- Proportional Allocation to Strata
- Proxy Respondent
- Purposive Sample
- Quota Sampling
- Random
- Random Sampling
- Random-Digit Dialing (RDD)
- Ranked-Set Sampling (RSS)
- Rare Populations
- Registration-Based Sampling (RBS)
- Repeated Cross-Sectional Design
- Replacement
- Representative Sample
- Research Design
- Respondent-Driven Sampling (RDS)
- Reverse Directory Sampling
- Rotating Panel Design
- Sample
- Sample Design
- Sample Size
- Sampling
- Sampling Fraction
- Sampling Frame
- Sampling Interval
- Sampling Pool
- Sampling without Replacement
- Screening
- Segments
- Self-Selected Sample
- Self-Selection Bias
- Sequential Sampling
- Simple Random Sample
- Small Area Estimation
- Snowball Sampling
- Strata
- Stratified Sampling
- Superpopulation
- Survey
- Systematic Sampling
- Target Population
- Telephone Households
- Telephone Surveys
- Troldahl-Carter-Bryant Respondent Selection Method
- Undercoverage
- Unit
- Unit Coverage
- Unit of Observation
- Universe
- Wave
- Web Survey
- Weighting
- Within-Unit Coverage
- Within-Unit Coverage Error
- Within-Unit Selection
- Zero-Number Banks
- Survey Industry
- American Association for Public Opinion Research (AAPOR)
- American Community Survey (ACS)
- American Statistical Association Section on Survey Research Methods (ASA-SRMS)
- Behavioral Risk Factor Surveillance System (BRFSS)
- Bureau of Labor Statistics (BLS)
- Cochran, W. G.
- Council for Marketing and Opinion Research (CMOR)
- Council of American Survey Research Organizations (CASRO)
- Crossley, Archibald
- Current Population Survey (CPS)
- Gallup Poll
- Gallup, George
- General Social Survey (GSS)
- Hansen, Morris
- Institute for Social Research (ISR)
- International Field Directors and Technologies Conference (IFD&TC)
- International Journal of Public Opinion Research (IJPOR)
- International Social Survey Programme (ISSP)
- Joint Program in Survey Methodology (JPSM)
- Journal of Official Statistics (JOS)
- Kish, Leslie
- National Health and Nutrition Examination Survey (NHANES)
- National Health Interview Survey (NHIS)
- National Household Education Surveys (NHES) Program
- National Opinion Research Center (NORC)
- Pew Research Center
- Public Opinion Quarterly (POQ)
- Roper Center for Public Opinion Research
- Roper, Elmo
- Sheatsley, Paul
- Statistics Canada
- Survey Methodology
- Survey Sponsor
- Telemarketing
- U.S. Bureau of the Census
- World Association for Public Opinion Research (WAPOR)
- Survey Statistics
- Algorithm
- Alpha, Significance Level of Test
- Alternative Hypothesis
- Analysis of Variance (ANOVA)
- Attenuation
- Auxiliary Variable
- Balanced Repeated Replication (BRR)
- Bias
- Bootstrapping
- Chi-Square
- Composite Estimation
- Confidence Interval
- Confidence Level
- Constant
- Contingency Table
- Control Group
- Correlation
- Covariance
- Cronbach's Alpha
- Cross-Sectional Data
- Data Swapping
- Design Effects (deff)
- Design-Based Estimation
- Ecological Fallacy
- Effective Sample Size
- Experimental Design
- F-Test
- Factorial Design
- Finite Population Correction (fpc) Factor
- Frequency Distribution
- Hot-Deck Imputation
- Imputation
- Independent Variable
- Inference
- Interaction Effect
- Internal Validity
- Interval Estimate
- Intracluster Homogeneity
- Jackknife Variance Estimation
- Level of Analysis
- Main Effect
- Margin of Error (MOE)
- Marginals
- Mean
- Mean Square Error
- Median
- Metadata
- Mode
- Model-Based Estimation
- Multiple Imputation
- Noncausal Covariation
- Null Hypothesis
- Outliers
- p-Value
- Panel Data Analysis
- Parameter
- Percentage Frequency Distribution
- Percentile
- Point Estimate
- Population Parameter
- Post-Survey Adjustments
- Precision
- Probability
- Raking
- Random Assignment
- Random Error
- Raw Data
- Recoded Variable
- Regression Analysis
- Relative Frequency
- Replicate Methods for Variance Estimation
- Research Hypothesis
- Research Question
- Rho
- Sampling Bias
- Sampling Error
- Sampling Variance
- SAS
- Seam Effect
- Significance Level
- Solomon Four-Group Design
- Standard Error
- Standard Error of the Mean
- STATA
- Statistic
- Statistical Package for the Social Sciences (SPSS)
- Statistical Power
- SUDAAN
- Systematic Error
- t-Test
- Taylor Series Linearization
- Test-Retest Reliability
- Total Survey Error (TSE)
- Type I Error
- Type II Error
- Unbiased Statistic
- Validity
- Variable
- Variance
- Variance Estimation
- WesVar
- z-Score
- Loading...
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches