Skip to main content icon/video/no-internet

Categorizing Continuous Data

Researchers sometimes wish to report or analyze a continuous variable by recoding a range of values into a few categories, such as taking all the scores from a test and categorizing them as high or low. This can be a valuable way to improve the interpretation of data but is often not recommended for statistical analysis, where the most precise measurements enable the most robust statistical tests to be deployed. This entry first discusses continuous and categorical variables and then provides an example to illustrate the use of transformed categorical variables.

Continuous Variables

Quantitative research uses numbers to represent the values of a variable, for instance age. For example, age is a variable because it can vary from one participant to the next within a sample. Many variables are preexisting known measurements, such as age in years, length of an object in centimeters, distance in kilometers, and time in minutes or seconds. Variables are considered as continuous variables when there is a range of numbers or scores that can represent the concept. For age, it would be age in years, where the value for that variable might be anywhere from 0–130 years of age, with every value in between being a possible value for that variable.

Other concepts have numeric values assigned to them, such as when participants in a study answer questions on an instrument, and their responses are allocated numbers, to calculate a composite score for the instrument. Responses to the questions on the questionnaire (or more technically called items on an instrument) are each allocated a numerical value. For instance, a Likert-type scale item asks respondents to select from a range of choices regarding how much they agree or disagree with a given statement. “I love numbers” could be the statement, and a response of strongly disagree might be ranked a score of 1, while somewhat disagree is ranked as 2 slightly disagree is ranked as 3 unsure is ranked as 4, and so on, all the way through to strongly agree being ranked as 7.

Additional items around other aspects of the same concept would be included, for example, another 19 items. Then the numeric score of all 20 items would be tallied or otherwise calculated to arrive at an overall composite score on a concept such as “Quantitative Research Efficacy.” Participants could achieve a score that ranges from 20 (if they answered strongly disagree to all 20 items) up to and including a score of 140 (if they answered strongly agree to all 20 items). This composite score would be a measure of their overall Quantitative Research Efficacy. All the many numeric points along the way are potential values of that variable: 20, 21, 22, 23 . . . 140, meaning it is a continuous variable. There are numerous well-established instruments to measure attributes of people, such as the Beck Depression Inventory (BDI) or the Spielberger State-Trait Anxiety Inventory.

Categorical Variables

Categorical variables, in contrast, are those variables where a participant can only be in one category at any given point in time because each category is mutually exclusive, and the values for that variable cannot be expressed by a continuous range of numbers. For instance, eye color, hair color, and skin color are examples of categorical variables, as are country of residence, postal code, and type of transport vehicle (e.g., car, motorcycle, bicycle, bus, tram, train). A person can only have one eye color or only be in one place or have one type of transport at any given time. Smoker versus nonsmoker is a simple example of a categorical variable.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading