Dummy Coding

Jie Chen

doi:10.4135/9781071812082

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Dummy Coding

By: Jie Chen
In:The SAGE Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781071812082.n173
Subject:Research Methods & Evaluation (general), Research Design
Keywords:degrees of freedom; dummy coding; reference groups

Request Permissions

Show page numbers Hide page numbers

[Page 464]Dummy coding is used when categorical variables (e.g., sex, geographic location, ethnicity) are of interest in prediction. It provides one way of using categorical predictor variables in various kinds of estimation models, such as linear regression. Dummy coding uses only 1s and 0s to convey all the necessary information on group membership. With this kind of coding, the researcher enters a 1 to indicate that a person is a member of a category, and a 0 otherwise.

Dummy codes are a series of numbers assigned to indicate group membership in any mutually exclusive and exhaustive category. Category membership is indicated in one or more columns of 0s and 1s. For example, a researcher could code sex as 1 = female, 0 = male or 1 = male, 0 = female. In this case the researcher would have a column variable indicating status as male or female. In general, with k groups there will be k-1 coded variables. Each of the dummy-coded variables uses 1 degree of freedom, so k groups have k-1 degrees of freedom, just as in analysis of variance (ANOVA). Consider the following example, in which there are four observations within each of the four groups:


Group	G1	G2	G3	G4
	1	2	5	10
	3	3	6	10
	2	4	4	9
	2	3	5	11
Mean	2	3	5	10

For this example we need to create three dummy-coded variables. We will call them d1, d2, and d3. For d1, every observation in Group 1 will be coded as 1 and observations in all other groups will be coded as 0. We will code d2 with 1 if the observation is in Group 2 and zero otherwise. For d3, observations in Group 3 will be coded 1 and zero for the other groups. There is no d4; it is not needed because d1 through d3 have all the information needed to determine which observation is in which group.

Here is how the data look after dummy coding:


Values	Group	d1	d2	d3
1	1	1	0	0
3	1	1	0	0
2	1	1	0	0
2	1	1	0	0
2	2	0	1	0
3	2	0	1	0
4	2	0	1	0
3	2	0	1	0
5	3	0	0	1
6	3	0	0	1
4	3	0	0	1
5	3	0	0	1
10	4	0	0	0
10	4	0	0	0
9	4	0	0	0
11	4	0	0	0

Note that every observation in Group 1 has the dummy-coded value of 1 for d1 and 0 for the others. Those in Group 2 have 1 for d2 and 0 otherwise, and for Group 3, d3 equals 1 with 0 for the others. Observations in Group 4 have all 0s on d1, d2, and d3. These three dummy variables contain all the information needed to determine which observations are included in which group. If you are in Group 2, then d2 is equal to 1 while d1 and d3 are 0. The group with all 0s is known as the reference group, which in this example is Group 4.

Dummy Coding in ANOVA

The use of nominal data in prediction requires the use of dummy codes; this is because data need to be represented quantitatively for predictive purposes, and [Page 465]nominal data lack this quality. Once the data are coded properly, the analysis can be interpreted in a manner similar to traditional ANOVA.

Suppose we have three groups of people, single, married, and divorced, and we want to estimate their life satisfaction. In the following table, the first column identifies the single group (observations of single status are dummy coded as 1 and 0 otherwise), and the second column identifies the married group (observations of married status are dummy coded as 1 and 0 otherwise). The divorced group is left over, meaning this group is the reference group. However, the overall results will be the same no matter which groups we select.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Dummy Coding

Dummy Coding in ANOVA

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends