Entry
Reader's guide
Entries A-Z
Subject index
Dummy Coding
Dummy coding is used when categorical variables (e.g., sex, geographic location, ethnicity) are of interest in prediction. It provides one way of using categorical predictor variables in various kinds of estimation models, such as linear regression. Dummy coding uses only 1s and 0s to convey all the necessary information on group membership. With this kind of coding, the researcher enters a 1 to indicate that a person is a member of a category, and a 0 otherwise.
Dummy codes are a series of numbers assigned to indicate group membership in any mutually exclusive and exhaustive category. Category membership is indicated in one or more columns of 0s and 1s. For example, a researcher could code sex as 1 = female, 0 = male or 1 = male, 0 = female. In this case the researcher would have a column variable indicating status as male or female. In general, with k groups there will be k-1 coded variables. Each of the dummy-coded variables uses 1 degree of freedom, so k groups have k-1 degrees of freedom, just as in analysis of variance (ANOVA). Consider the following example, in which there are four observations within each of the four groups:
Group | G1 | G2 | G3 | G4 |
1 | 2 | 5 | 10 | |
3 | 3 | 6 | 10 | |
2 | 4 | 4 | 9 | |
2 | 3 | 5 | 11 | |
Mean | 2 | 3 | 5 | 10 |
For this example we need to create three dummy-coded variables. We will call them d1, d2, and d3. For d1, every observation in Group 1 will be coded as 1 and observations in all other groups will be coded as 0. We will code d2 with 1 if the observation is in Group 2 and zero otherwise. For d3, observations in Group 3 will be coded 1 and zero for the other groups. There is no d4; it is not needed because d1 through d3 have all the information needed to determine which observation is in which group.
Here is how the data look after dummy coding:
Values | Group | d1 | d2 | d3 |
1 | 1 | 1 | 0 | 0 |
3 | 1 | 1 | 0 | 0 |
2 | 1 | 1 | 0 | 0 |
2 | 1 | 1 | 0 | 0 |
2 | 2 | 0 | 1 | 0 |
3 | 2 | 0 | 1 | 0 |
4 | 2 | 0 | 1 | 0 |
3 | 2 | 0 | 1 | 0 |
5 | 3 | 0 | 0 | 1 |
6 | 3 | 0 | 0 | 1 |
4 | 3 | 0 | 0 | 1 |
5 | 3 | 0 | 0 | 1 |
10 | 4 | 0 | 0 | 0 |
10 | 4 | 0 | 0 | 0 |
9 | 4 | 0 | 0 | 0 |
11 | 4 | 0 | 0 | 0 |
Note that every observation in Group 1 has the dummy-coded value of 1 for d1 and 0 for the others. Those in Group 2 have 1 for d2 and 0 otherwise, and for Group 3, d3 equals 1 with 0 for the others. Observations in Group 4 have all 0s on d1, d2, and d3. These three dummy variables contain all the information needed to determine which observations are included in which group. If you are in Group 2, then d2 is equal to 1 while d1 and d3 are 0. The group with all 0s is known as the reference group, which in this example is Group 4.
Dummy Coding in ANOVA
The use of nominal data in prediction requires the use of dummy codes; this is because data need to be represented quantitatively for predictive purposes, and nominal data lack this quality. Once the data are coded properly, the analysis can be interpreted in a manner similar to traditional ANOVA.
Suppose we have three groups of people, single, married, and divorced, and we want to estimate their life satisfaction. In the following table, the first column identifies the single group (observations of single status are dummy coded as 1 and 0 otherwise), and the second column identifies the married group (observations of married status are dummy coded as 1 and 0 otherwise). The divorced group is left over, meaning this group is the reference group. However, the overall results will be the same no matter which groups we select.
...
- Bayesian Statistics
- Descriptive Statistics
- Central Tendency, Measures of
- Cohen’s d Statistic
- Cohen’s f Statistic
- Correspondence Analysis
- Descriptive Statistics
- Effect Size, Measures of
- Eta-Squared
- Factor Loadings
- Mean
- Median
- Mode
- Partial Eta-Squared
- Range
- Relative Measures of Dispersion
- Standard Deviation
- Statistic
- Trimmed Mean
- Variability, Measure of
- Variance
- Distributions
- z Distribution
- Bernoulli Distribution
- Beta Distribution
- Binomial Distribution
- Copula Functions
- Cumulative Frequency Distribution
- Distribution
- Frequency Distribution
- Kurtosis
- Law of Large Numbers
- Negative Hypergeometric Distribution
- Normal Distribution
- Normalizing Data
- Poisson Distribution
- Quetelet’s Index
- Sampling Distributions
- Weibull Distribution
- Winsorize
- Graphical Displays of Data
- Bar Chart
- Box-and-Whisker Plot
- Column Graph
- Data Visualization
- Exponential Random Graph Models
- Forest Plot
- Frequency Table
- Funnel Plot
- Graph Theory
- Graphical Display of Data
- Growth Curve
- Histogram
- L’Abbé Plot
- Line Graph
- Nomograms
- Ogive
- Pie Chart
- Radial Plot
- Residual Plot
- Scatterplot
- Spaghetti Plot
- U-Shaped Curve
- Visual Analysis
- Visual Display of Quantitative Information
- Hypothesis Testing
- p Value
- Alternative Hypotheses
- Beta
- Critical Value
- Decision Rule
- Equivalence Hypothesis Testing
- Hypothesis
- Nondirectional Hypotheses
- Nonsignificance
- Null Hypothesis
- One-Tailed Test
- Power
- Power Analysis
- Significance Level, Concept of
- Significance Level, Interpretation and Construction
- Significance, Statistical
- Two-Tailed Test
- Type I Error
- Type II Error
- Type III Error
- Important Publications
- “Coefficient Alpha and the Internal Structure of Tests”
- “Convergent and Discriminant Validation by the Multitrait–Multimethod Matrix”
- “Meta-Analysis of Psychotherapy Outcome Studies”
- “On the Theory of Scales of Measurement”
- “Probable Error of a Mean, The”
- “Psychometric Experiments”
- “Sequential Tests of Statistical Hypotheses”
- “Structural Holes: The Social Structure of Competition”
- “Technique for the Measurement of Attitudes, A”
- “Validity”
- Aptitudes and Instructional Methods
- Doctrine of Chances, The
- Logic of Scientific Discovery, The
- Nonparametric Statistics for the Behavioral Sciences
- Probabilistic Models for Some Intelligence and Attainment Tests
- Social Network Analysis Methodsand Applications
- Statistical Power Analysis for the Behavioral Sciences
- Strength of Weak Ties
- Structural Equivalence of Individuals in Social Networks
- Teoria Statistica Delle Classi e Calcolo Delle Probabilità
- Inferential Statistics
- Association, Measures of
- Coefficient of Concordance
- Coefficient of Variation
- Coefficients of Alienation and Determination
- Confidence Intervals
- Correlation Coefficient
- Margin of Error
- Nonparametric Statistics
- Odds Ratio
- Parameters
- Parametric Statistics
- Partial Correlation
- Pearson Product-Moment Correlation Coefficient
- Polychoric Correlation Coefficient
- Q-Statistic
- R2
- Randomization Tests
- Regression Coefficient
- Semipartial Correlation Coefficient
- Spearman Rank Order Correlation
- Standard Error of Estimate
- Standard Error of the Mean
- Student’s t Test
- Unbiased Estimator
- Weights
- Item Response Theory
- Mathematical Concepts
- Measurement Concepts
- z Score
- Categorizing Continuous Data
- Ceiling Effect
- Cut Scores
- False Positive
- Gain Scores, Analysis of
- Instrumentation
- Interval Recording
- Ipsative Data
- Item Analysis
- Item–Test Correlation
- Measurement Invariance
- Observations
- Partial Measurement Invariance
- Percentile Rank
- Psychometrics
- Random Error
- Raw Scores
- Response Bias
- Rubrics
- Sensitivity
- Social Desirability
- Sociograms
- Sociometric Tests
- Specificity
- Standardized Score
- Survey
- Tau Equivalence
- Test
- Then-Test
- True Positive
- Organizations
- Publishing
- Qualitative Research
- Case Study
- Content Analysis
- Conversation Analysis
- Critical Case
- Discourse Analysis
- Ethnography
- Field Notes
- Focus Group
- Instrumental Case Study
- Interval Recording
- Interviewing
- Member Checks
- Memos
- Multiple Case Study
- Narrative Research
- Naturalistic Inquiry
- Naturalistic Observation
- Qualitative Research
- Saturation
- Semi-Structured Interview
- Think-Aloud Methods
- Reliability of Scores
- Correction for Attenuation
- Cronbach’s Alpha
- Internal Consistency Reliability
- Interrater Reliability
- KR-20
- Krippendorff’s Alpha
- McDonald’s Omega Hierarchical
- Parallel Forms Reliability
- Reliability
- Spearman–Brown Prophecy Formula
- Split-Half Reliability
- Standard Error of Measurement
- Test–Retest Reliability
- True Score
- Research Design Concepts
- Aptitude–Treatment Interaction
- Cause and Effect
- Concomitant Variable
- Confounding
- Control Group
- Good Clinical Research Practice
- Interaction
- Internet-Based Research Methods
- Intervention
- Matching
- Mortality
- Multiple Case Study
- Natural Experiments
- Network Analysis
- Peer Effects
- Placebo
- Reciprocity
- Replication
- Research
- Research Design Principles
- Treatment(s)
- Triangulation
- Unit of Analysis
- Yoked Control Procedure
- Research Designs
- A Priori Monte Carlo Simulation
- Action Research
- Adaptive Designs in Clinical Trials
- Alternating Treatments Design
- Applied Research
- Balanced Incomplete Block Design
- Basket Trials Design
- Behavior Analysis Design
- Block Design
- Blockmodeling
- Case-Only Design
- Causal-Comparative Design
- Changing Criterion Design
- Cohort Design
- Completely Randomized Design
- Confirmatory Research
- Cross-Sectional Design
- Crossover Design
- Double-Blind Procedure
- Evaluation Research Design
- Ex Post Facto Study
- Experimental Design
- Exploratory Research
- Factorial Design
- Field Study
- Group-Sequential Designs in Clinical Trials
- Laboratory Experiments
- Latin Square Design
- Longitudinal Design
- Meta-Analysis
- Mixed Methods Design
- Mixed Model Design
- Mixture Models
- Monte Carlo Simulation
- Multiple Baseline Single Case Experimental Design
- Nested Factor Design
- Nonexperimental Designs
- Observational Research
- Panel Design
- Partially Randomized Preference Trial Design
- Pilot Study
- Pragmatic Study
- Pre-Experimental Designs
- Pretest–Posttest Design
- Propensity Score Matching
- Prospective Study
- Quadratic Assignment Procedure
- Quantitative Research
- Quasi-Experimental Design
- Randomized Block Design
- Repeated Measures Design
- Response Surface Design
- Retrospective Study
- Sequential Design
- Single-Blind Study
- Single-Case Research Design
- Split-Plot Factorial Design
- Stepped-Wedge Design
- Stepwise Model Selection
- Thought Experiments
- Time-Lag Study
- Time-Series Study
- Triple-Blind Study
- True Experimental Design
- Umbrella Trials Design
- Wennberg Design
- Within-Subjects Design
- Zelen’s Randomized Consent Design
- Research Ethics
- Adverse Event Reporting
- Animal Research
- Anonymity
- Assent
- Belmont Report
- Beneficence
- Confidentiality
- Cultural Competence
- Data and Safety Monitoring
- Debriefing
- Deception
- Declaration of Helsinki
- Ethics in the Research Process
- Informed Consent
- Justice and Social Science Research
- Multisite Research Studies
- Nuremberg Code
- Participants
- Recruitment
- Respect for Persons
- Risk in Human Subjects Research
- Transparency
- Research Process
- Biological and Technical Replicates
- Clinical Significance
- Clinical Trial
- Cognitive Laboratory
- Cross-Validation
- Data Cleaning
- Data Mining
- Data Snooping
- Delphi Technique
- Evidence-Based Decision Making
- Exploratory Data Analysis
- Follow-Up
- Inference: Deductive and Inductive
- Last Observation Carried Forward
- Masking
- Multisite Research Studies
- Operationalizing
- Primary Data Source
- Protocol
- Q Methodology
- Research Hypothesis
- Research Question
- Scientific Method
- Secondary Data Source
- SPIRIT 2013 Statement
- Standardization
- Statistical Control
- Type III Error
- Wave
- Research Validity Issues
- Bias
- Critical Thinking
- Ecological Validity
- Experimenter Expectancy Effect
- External Validity
- File Drawer Problem
- Hawthorne Effect
- Heisenberg Effect
- Instrumentation as a Threat to Internal Validity
- Internal Validity
- John Henry Effect
- Multiple Treatment Interference
- Multivalued Treatment Effects
- Nonclassical Experimenter Effects
- Order Effects
- Placebo Effect
- Pretest Sensitization
- Random Assignment
- Reactive Arrangements
- Regression to the Mean
- Selection Bias
- Sequence Effects
- Threats to Validity
- Validity of Research Conclusions
- Volunteer Bias
- White Noise
- Sampling
- Cluster Sampling
- Comparison-Focused Sampling
- Convenience Sampling
- Demographics
- Error
- Exclusion Criteria
- Experience Sampling Method
- Gibbs Sampler
- Nested Sampling
- Network Sampling
- Nonprobability Sampling
- Population
- Probability Sampling
- Proportional Sampling
- Quota Sampling
- Random Sampling
- Random Selection
- Sample
- Sample Size
- Sample Size Planning
- Sampling
- Sampling Error
- Sequential Sampling
- Stratified Sampling
- Survey Sampling
- Systematic Sampling
- Theoretical Sampling
- Underrepresented Groups
- Scaling
- Social Network Analysis
- Alters
- Connectivity
- Core-Periphery Structure
- Ego-Centric Networks
- International Network for Social Network Analysis
- Name Generator
- Network Boundaries
- Network Composition
- Network Density
- Network Distance
- Network Matrices
- Network Meta-Analysis
- Network Sampling
- Network Size
- Network Structure
- Network Visualization
- Node, Relationship, and Network Attributes
- Nodes and Relationships
- One-Mode Data
- Social Network Analysis
- Sociograms
- Structural Holes
- Two-Mode Data
- Whole Networks
- Software Applications
- Statistical Assumptions
- Statistical Concepts
- Akaike Information Criterion
- Autocorrelation
- Biased Estimator
- Centrality
- Cohen’s Kappa
- Collinearity
- Correlation
- Criterion Problem
- Critical Difference
- Data Mining
- Data Snooping
- Degrees of Freedom
- Directional Hypothesis
- Disturbance Terms
- Error Rates
- Expected Value
- Factorial Invariance
- Fixed-Effects Model
- Hedges’ g
- Heterogeneity
- Inclusion Criteria
- Influence Statistics
- Influential Data Points
- Intraclass Correlation
- Latent Change Score
- Latent Variable
- Likelihood Principle
- Likelihood Ratio Statistic
- Loglinear Models
- Machine Learning
- Main Effects
- Markov Chains
- McDonald’s Omega Hierarchical
- Method Variance
- Mixed- and Random-Effects Models
- Multilevel Modeling
- Multiplicity Problem
- Neural Networks
- Nuisance Parameters
- Odds
- Omega Squared
- Orthogonal Comparisons
- Outlier
- Overfitting
- Partial Factorial Invariance
- Pooled Variance
- Precision
- Quality Effects Model
- Random-Effects Models
- Regression Artifacts
- Regression Discontinuity
- Residuals
- Restriction of Range
- Robust
- Robust Maximum Likelihood
- Root Mean Square Error
- Rosenthal Effect
- Semi-Interquartile Range
- Serial Correlation
- Shrinkage
- Simple Main Effects
- Simpson’s Paradox
- Stochastic Processes
- Sums of Squares
- Statistical Procedures
- F Test
- t Test, Independent Samples
- t Test, One Sample
- t Test, Paired Samples
- Accuracy in Parameter Estimation
- Analysis of Covariance
- Analysis of Variance
- Bartlett’s Test
- Barycentric Discriminant Analysis
- Behrens–Fisher t’ Statistic
- Bivariate Regression
- Bonferroni Procedure
- Bootstrapping
- Canonical Correlation Analysis
- Categorical Data Analysis
- Chi-Square Test
- Cluster Analysis
- Confirmatory Factor Analysis
- Contingency Table Analysis
- Contrast Analysis
- Descriptive Discriminant Analysis
- Diagnostic Classification Modeling
- Dummy Coding
- Duncan’s Multiple Range Test
- Dunnett’s Test
- Effect Coding
- Estimation
- Exploratory Factor Analysis
- Fisher’s Least Significant Difference Test
- Friedman Test
- Greenhouse–Geisser Correction
- Hidden Markov Model
- Hierarchical Linear Modeling
- Holm’s Sequential Bonferroni Procedure
- Jackknife
- Kolmogorov–Smirnov Test
- Kruskal–Wallis Test
- Latent Class Analysis
- Latent Growth Modeling
- Latent Profile Analysis
- Least Squares, Methods of
- Logistic Regression
- Mann–Whitney U Test
- Mauchly Test
- Maximum Likelihood Estimation
- McNemar’s Test
- Mean Comparisons
- Missing Data, Imputation of
- Multidimensional Scaling
- Multiple Comparison Tests
- Multiple Comparisons With Modeling Techniques
- Multiple Regression
- Multivariate Analysis of Variance
- Newman–Keuls Test
- Omnibus Tests
- Pairwise Comparisons
- Path Analysis
- Post Hoc Analysis
- Post Hoc Comparisons
- Predictive Discriminant Analysis
- Principal Components Analysis
- Propensity Score Analysis
- Scheffé Test
- Sequential Analysis
- Sign Test
- Stepwise Model Selection
- Stepwise Regression
- Structural Equation Modeling
- Survival Analysis
- Trend Analysis
- Tukey’s Honestly Significant Difference
- Welch’s t Test
- Wilcoxon Rank Sum Test
- Yates’s Correction
- Statistical Tests
- F Test
- t Test, Independent Samples
- t Test, One Sample
- t Test, Paired Samples
- z Test
- Bartlett’s Test
- Behrens–Fisher t’ Statistic
- Chi-Square Test
- Cochran–Armitage Test for Trend
- Duncan’s Multiple Range Test
- Dunnett’s Test
- Fisher’s Least Significant Difference Test
- Friedman Test
- Hosmer-Lemeshow Test
- Kolmogorov–Smirnov Test
- Kruskal–Wallis Test
- Mann–Whitney U Test
- Mauchly Test
- McNemar’s Test
- Multiple Comparison Tests
- Newman–Keuls Test
- Omnibus Tests
- Scheffé Test
- Sign Test
- Tukey’s Honestly Significant Difference
- Welch’s t Test
- Wilcoxon Rank Sum Test
- Structural Equation Modeling
- Theories, Laws, and Principles
- Central Limit Theorem
- Classical Test Theory
- Correspondence Principle
- Critical Theory
- Diffusion of Innovation Theory
- Falsifiability
- Game Theory
- Gauss–Markov Theorem
- Generalizability Theory
- Graph Theory
- Grounded Theory
- Item Response Theory
- Likelihood Principle
- Machine Learning
- Models
- Neural Networks
- Occam’s Razor
- Paradigm
- Positivism
- Postmodernism
- Probability, Laws of
- Social Capital Theory
- Social Support Theory
- Structural Paradigm
- Theory
- Theory of Attitude Measurement
- Toulmin Method
- Weber–Fechner Law
- Types of Variables
- Validity of Scores
- Loading...
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches