Skip to main content icon/video/no-internet

Categorical Variable

Categorical variables are qualitative data in which the values are assigned to a set of distinct groups or categories. These groups may consist of alphabetic (e.g., male, female) or numeric labels (e.g., male = 0, female =1) that do not contain mathematical information beyond the frequency counts related to group membership. Instead, categorical variables often provide valuable social-oriented information that is not quantitative by nature (e.g., hair color, religion, ethnic group).

In the hierarchy of measurement levels, categorical variables are associated with the two lowest variable classification orders, nominal or ordinal scales, depending on whether the variable groups exhibit an intrinsic ranking. A nominal measurement level consists purely of categorical variables that have no ordered structure for intergroup comparison. If the categories can be ranked according to a collectively accepted protocol (e.g., from lowest to highest), then these variables are ordered categorical, a subset of the ordinal level of measurement.

Categorical variables at the nominal level of measurement have two properties. First, the categories are mutually exclusive. That is, an object can belong to only one category. Second, the data categories have no logical order. For example, researchers can measure research participants’ religious backgrounds, such as Jewish, Protestant, Muslim, and so on, but they cannot order these variables from lowest to highest. It should be noted that when categories get numeric labels such as male =0 and female =1 or control group =0 and treatment group = 1, the numbers are merely labels and do not indicate one category is “better” on some aspect than another. The numbers are used as symbols (codes) and do not reflect either quantities or a rank ordering. Dummy coding is the quantification of a variable with two categories (e.g., boys, girls). Dummy coding will allow the researcher to conduct specific analyses such as the point-biserial correlation coefficient, in which a dichotomous categorical variable is related to a variable that is continuous. One example of the use of point-biserial correlation is to compare males with females on a measure of mathematical ability.

Categorical variables at the ordinal level of measurement have the following properties: (a) the data categories are mutually exclusive, (b) the data categories have some logical order, and (c) the data categories are scaled according to the amount of a particular characteristic. Grades in courses (i.e., A, B, C, D, and F) are an example. The person who earns an A in a course has a higher level of achievement than one who gets a B, according to the criteria used for measurement by the course instructor. However, one cannot assume that the difference between an A and a B is the same as the difference between a B and a C. Similarly, researchers might set up a Likert-type scale to measure level of satisfaction with one's job and assign a 5 to indicate extremely satisfied, 4 to indicate very satisfied, 3 to indicate moderately satisfied, and so on. A person who gives a rating of 5 feels more job satisfaction than a person who gives a rating of 3, but it has no meaning to say that one person has 2 units more satisfaction with a job than another has or exactly how much more satisfied one is with a job than another person is.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading