Entry
Reader's guide
Entries A-Z
Subject index
Categorical Data Analysis
Social science data, when quantified, can be subject to a variety of statistical analyses, the most common of which is regression analysis. However, the requirements of a continuous dependent variable and of other regression assumptions make linear regression sometimes a less desirable analytic tool because a lot of social science data are categorical.
Social science data can be categorical in two common ways. First, a variable is categorical when it records nominal or discrete groups. In political science, a political candidate in the United States can be a Democrat, Republican, or Independent. In sociology, one's occupation is studied as a discrete outcome. In demography, one's contraceptive choice such as the pill or the condom is categorical. Education researchers may study discrete higher education objectives of high school seniors: university, community college, or vocational school.
Furthermore, a variable can take on an ordinal scale such as the Likert scale or a scale that resembles it. Such a scale is widely used in psychology in particular and in the social sciences in general for measuring personality traits, attitudes, and opinions and typically has five ordered categories ranging from most important to least important or from strongly agree to strongly disagree with a neutral middle category. An ordinal scale has two major features: There exists a natural order among the categories, and the distance between a lower positioned category and the next category in the scale is not necessarily evenly distributed throughout the scale. Variables not measuring personality or attitudes can also be represented with an ordinal scale. For example, one's educational attainment can be simply measured with the three categories of “primary,” “secondary,” and “higher education.” These categories follow a natural ordering, but the distance between “primary” and “secondary” and that between “secondary” and “higher education” is not necessarily equal. It is apparent that an ordinal variable has at least three ordered categories. Although there is no upper limit for the total number of categories, researchers seldom use a scale of more than seven ordered categories.
The examples of categorical data above illustrate their prevalence in the social sciences. Categorical data analysis, in practice, is the analysis of categorical response variables.
Historical Development
The work by Karl Pearson and G. Udny Yule on the association between categorical variables at the turn of the 20th century paved the way for later development in models for discrete responses. Pearson's contribution is well known through his namesake chi-square statistic, whereas Yule was a strong proponent of the odds ratio in analyzing association. However, despite important contributions by noted statisticians such as R. A. Fisher and William Cochran, categorical data analysis as we know it today did not develop until the 1960s.
The postwar decades saw a rising interest in explaining social issues and a burgeoning need for skilled social science researchers. As North American universities increased in size to accommodate postwar baby boomers, so did their faculties and the bodies of university-based social science researchers. Categorical scales come naturally for measuring attitudes, social class, and many other attributes and concepts in the social sciences. Increasingly in the 1960s, social surveys were conducted, and otherwise quantifiable data were obtained. The increasing methodological sophistication in the social sciences satisfied the increasing need for analytic methods for handling the increasingly available categorical data. It is not surprising that many major statisticians who developed regression-type models for discrete responses were all academicians with social sciences affiliations or ties, such as Leo Goodman, Shelby Haberman, Frederick Mosteller, Stephen Fienberg, and Clifford Clogg. These methodologists focused on log-linear models, whereas their counterparts in the biomedical sciences concentrated their research on regression-type models for categorical data. Together, they and their biomedical colleagues have taken categorical data analysis to a new level.
...
- Analysis of Variance
- Association and Correlation
- Association
- Association Model
- Asymmetric Measures
- Biserial Correlation
- Canonical Correlation Analysis
- Correlation
- Correspondence Analysis
- Intraclass Correlation
- Multiple Correlation
- Part Correlation
- Partial Correlation
- Pearson's Correlation Coefficient
- Semipartial Correlation
- Simple Correlation (Regression)
- Spearman Correlation Coefficient
- Strength of Association
- Symmetric Measures
- Basic Qualitative Research
- Basic Statistics
- F Ratio
- N(n)
- t-Test
- X¯
- Y Variable
- z-Test
- Alternative Hypothesis
- Average
- Bar Graph
- Bell-Shaped Curve
- Bimodal
- Case
- Causal Modeling
- Cell
- Covariance
- Cumulative Frequency Polygon
- Data
- Dependent Variable
- Dispersion
- Exploratory Data Analysis
- Frequency Distribution
- Histogram
- Hypothesis
- Independent Variable
- Measures of Central Tendency
- Median
- Null Hypothesis
- Pie Chart
- Regression
- Standard Deviation
- Statistic
- Causal Modeling
- Discourse/Conversation Analysis
- Econometrics
- Epistemology
- Ethnography
- Evaluation
- Event History Analysis
- Experimental Design
- Factor Analysis and Related Techniques
- Feminist Methodology
- Generalized Linear Models
- Historical/Comparative
- Interviewing in Qualitative Research
- Latent Variable Model
- Life History/Biography
- Log-Linear Models (Categorical Dependent Variables)
- Longitudinal Analysis
- Mathematics and Formal Models
- Measurement Level
- Measurement Testing and Classification
- Multilevel Analysis
- Multiple Regression
- Qualitative Data Analysis
- Sampling in Qualitative Research
- Sampling in Surveys
- Scaling
- Significance Testing
- Simple Regression
- Survey Design
- Time Series
- ARIMA
- Box-Jenkins Modeling
- Cointegration
- Detrending
- Durbin-Watson Statistic
- Error Correction Models
- Forecasting
- Granger Causality
- Interrupted Time-Series Design
- Intervention Analysis
- Lag Structure
- Moving Average
- Periodicity
- Serial Correlation
- Spectral Analysis
- Time-Series Cross-Section (TSCS) Models
- Time-Series Data (Analysis/Design)
- Trend Analysis
- Loading...
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches