Entry
Reader's guide
Entries A-Z
Subject index
Standardized Score
There are times when it is important to compare the scores of different types of data that are scored in different units or to compare scores within a sample or a population. One common example is that students typically want to understand how their test score compares with the scores of the rest of the class. Or they might want to understand how their scores compare across different classes, which could each be scored using somewhat different methods. In either case, the challenge is somewhat like attempting to compare apples with oranges. A standardized score is calculated on an arbitrary (but universal) scale, which has the effect of turning the apples and oranges into pears, scores that can be more easily evaluated and interpreted. In other words, a raw score can be converted from one measurement scale to another to facilitate data comparison. This entry discusses comparative and standardization methods.
Comparative Methods
Several different methods are used to create scores that are comparable with one another; however, some of the more familiar methods can have limited comparative accuracy. For example, there are 23 students who are each taking mathematics, history, and psychology midterm exams. One student, Ben, scores a 53 in math, 77 in history, and 88 in psychology. Comparing these scores, it seemed to Ben that he is not very good at math. But these are raw scores that can tell us only the number of questions that were correctly answered on each examination. Without a frame of reference, it is not possible to determine Ben's standing in each class relative to the other students or to understand the relationship of his performance across his different classes. One way to understand Ben's performance compared with the other students is to use the range of examination scores within each class. In this example, the scores of all the students who took the math examination ranged from a low of 48 to a high of 57. This means Ben's score was just above the middle of that range, which might indicate his score is close to average when compared with the rest of the class. The history exam scores ranged from 75 to 99, so Ben's score was nearly at the bottom of that range, seeming to be a very poor score compared with the other students. And, the psychology scores ranged from 86 to 90, which puts Ben's score precisely in the middle of the range of his classmates’ scores.
A somewhat more useful way of calculating scores for comparative purposes would be to calculate the percentage scores for each examination (i.e., number of correct answers divided by total possible correct answers, then multiplied by 100). This type of calculation might provide more equitable scores for a comparison within and between classes. Because the percentage scores (not to be confused with percentile scores) are dependent on the total number of questions on the examination, converting Ben's raw score of 53 on his math examination to a percentage might show that his score compares differently both within and between his classes than when using simple range values for comparison. Some might immediately think that calculating percentages solves the comparison problem: not necessarily and, most likely, not adequately for accurately understanding Ben's performance within each class, or his overall performance when comparing across his classes. Ben's math exam had a total of 150 questions, so his raw score of 53 converts to 35%. His history test had 100 questions, so his raw score of 77 remains at 77%. And, his psychology test had 90 questions, turning his raw score of 88 to 98%. When comparing scores calculated as a percentage of the total instead of comparing raw scores, it seems that Ben's math score is even worse than he originally thought. But, Ben's math instructor creates very difficult exams and on this exam everyone in the class scored from 32% to 38%. Consistent with the comparison of the range of scores, his score of 35% falls precisely in the center of the class percentages, which could mean he is actually an average student. On the history exam, the other students scored from 75% to 99%. Ben's score of 77% is well below average, and it is still consistent with his ranking using the range of scores. But, on the psychology exam the other students scored from 92% to 100%, which indicates Ben's score of 98% is close to the top of the class, a fairly substantial increase from the comparison using the range of scores. However, that there are differences between the two comparison methods makes it very difficult to determine which set of comparisons are correct. Furthermore, while there are indicators of the spread of the scores from each exam using the upper and lower ends of either the raw or the percentage scores, neither method provides information about the variability of the scores. If only one student scored lower than Ben on the math exam and the rest scored very close to the top of the range, Ben's score would not be average after all. Although his score was numerically in the middle of both the range and percentage scores, he actually got the second lowest score of all the 23 students. And, because the range and the percentage scores are determined only within each group, both types of scores are insufficient for comparing Ben's scores from each of the classes. These simple examples illustrate some of the important pitfalls of using some of the possible methods of comparisons.
...
- Descriptive Statistics
- Distributions
- Graphical Displays of Data
- Hypothesis Testing
- Alternative Hypotheses
- Beta
- Critical Value
- Decision Rule
- Hypothesis
- Nondirectional Hypotheses
- Nonsignificance
- Null Hypothesis
- One-Tailed Test
- p Value
- Power
- Power Analysis
- Significance Level, Concept of
- Significance Level, Interpretation and Construction
- Significance, Statistical
- Two-Tailed Test
- Type I Error
- Type II Error
- Type III Error
- Important Publications
- “Coefficient Alpha and the Internal Structure of Tests”
- “Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix”
- “Meta-Analysis of Psychotherapy Outcome Studies”
- “On the Theory of Scales of Measurement”
- “Probable Error of a Mean, The”
- “Psychometric Experiments”
- “Sequential Tests of Statistical Hypotheses”
- “Technique for the Measurement of Attitudes, A”
- “Validity”
- Aptitudes and Instructional Methods
- Doctrine of Chances, The
- Logic of Scientific Discovery, The
- Nonparametric Statistics for the Behavioral Sciences
- Probabilistic Models for Some Intelligence and Attainment Tests
- Statistical Power Analysis for the Behavioral Sciences
- Teoria Statistica Delle Classi e Calcolo Delle Probabilità
- Inferential Statistics
- Association, Measures of
- Coefficient of Concordance
- Coefficient of Variation
- Coefficients of Correlation, Alienation, and Determination
- Confidence Intervals
- Margin of Error
- Nonparametric Statistics
- Odds Ratio
- Parameters
- Parametric Statistics
- Partial Correlation
- Pearson Product-Moment Correlation Coefficient
- Polychoric Correlation Coefficient
- Q-Statistic
- R2
- Randomization Tests
- Regression Coefficient
- Semipartial Correlation Coefficient
- Spearman Rank Order Correlation
- Standard Error of Estimate
- Standard Error of the Mean
- Student's t Test
- Unbiased Estimator
- Weights
- Item Response Theory
- Mathematical Concepts
- Measurement Concepts
- Organizations
- Publishing
- Qualitative Research
- Reliability of Scores
- Research Design Concepts
- Aptitude-Treatment Interaction
- Cause and Effect
- Concomitant Variable
- Confounding
- Control Group
- Interaction
- Internet-Based Research Method
- Intervention
- Matching
- Natural Experiments
- Network Analysis
- Placebo
- Replication
- Research
- Research Design Principles
- Treatment(s)
- Triangulation
- Unit of Analysis
- Yoked Control Procedure
- Research Designs
- A Priori Monte Carlo Simulation
- Action Research
- Adaptive Designs in Clinical Trials
- Applied Research
- Behavior Analysis Design
- Block Design
- Case-Only Design
- Causal-Comparative Design
- Cohort Design
- Completely Randomized Design
- Cross-Sectional Design
- Crossover Design
- Double-Blind Procedure
- Ex Post Facto Study
- Experimental Design
- Factorial Design
- Field Study
- Group-Sequential Designs in Clinical Trials
- Laboratory Experiments
- Latin Square Design
- Longitudinal Design
- Meta-Analysis
- Mixed Methods Design
- Mixed Model Design
- Monte Carlo Simulation
- Nested Factor Design
- Nonexperimental Design
- Observational Research
- Panel Design
- Partially Randomized Preference Trial Design
- Pilot Study
- Pragmatic Study
- Pre-Experimental Designs
- Pretest-Posttest Design
- Prospective Study
- Quantitative Research
- Quasi-Experimental Design
- Randomized Block Design
- Repeated Measures Design
- Response Surface Design
- Retrospective Study
- Sequential Design
- Single-Blind Study
- Single-Subject Design
- Split-Plot Factorial Design
- Thought Experiments
- Time Studies
- Time-Lag Study
- Time-Series Study
- Triple-Blind Study
- True Experimental Design
- Wennberg Design
- Within-Subjects Design
- Zelen's Randomized Consent Design
- Research Ethics
- Research Process
- Clinical Significance
- Clinical Trial
- Cross-Validation
- Data Cleaning
- Delphi Technique
- Evidence-Based Decision Making
- Exploratory Data Analysis
- Follow-Up
- Inference: Deductive and Inductive
- Last Observation Carried Forward
- Planning Research
- Primary Data Source
- Protocol
- Q Methodology
- Research Hypothesis
- Research Question
- Scientific Method
- Secondary Data Source
- Standardization
- Statistical Control
- Type III Error
- Wave
- Research Validity Issues
- Bias
- Critical Thinking
- Ecological Validity
- Experimenter Expectancy Effect
- External Validity
- File Drawer Problem
- Hawthorne Effect
- Heisenberg Effect
- Internal Validity
- John Henry Effect
- Mortality
- Multiple Treatment Interference
- Multivalued Treatment Effects
- Nonclassical Experimenter Effects
- Order Effects
- Placebo Effect
- Pretest Sensitization
- Random Assignment
- Reactive Arrangements
- Regression to the Mean
- Selection
- Sequence Effects
- Threats to Validity
- Validity of Research Conclusions
- Volunteer Bias
- White Noise
- Sampling
- Cluster Sampling
- Convenience Sampling
- Demographics
- Error
- Exclusion Criteria
- Experience Sampling Method
- Nonprobability Sampling
- Population
- Probability Sampling
- Proportional Sampling
- Quota Sampling
- Random Sampling
- Random Selection
- Sample
- Sample Size
- Sample Size Planning
- Sampling
- Sampling and Retention of Underrepresented Groups
- Sampling Error
- Stratified Sampling
- Systematic Sampling
- Scaling
- Software Applications
- Statistical Assumptions
- Statistical Concepts
- Autocorrelation
- Biased Estimator
- Cohen's Kappa
- Collinearity
- Correlation
- Criterion Problem
- Critical Difference
- Data Mining
- Data Snooping
- Degrees of Freedom
- Directional Hypothesis
- Disturbance Terms
- Error Rates
- Expected Value
- Fixed-Effects Models
- Inclusion Criteria
- Influence Statistics
- Influential Data Points
- Intraclass Correlation
- Latent Variable
- Likelihood Ratio Statistic
- Loglinear Models
- Main Effects
- Markov Chains
- Method Variance
- Mixed- and Random-Effects Models
- Models
- Multilevel Modeling
- Odds
- Omega Squared
- Orthogonal Comparisons
- Outlier
- Overfitting
- Pooled Variance
- Precision
- Quality Effects Model
- Random-Effects Models
- Regression Artifacts
- Regression Discontinuity
- Residuals
- Restriction of Range
- Robust
- Root Mean Square Error
- Rosenthal Effect
- Serial Correlation
- Shrinkage
- Simple Main Effects
- Simpson's Paradox
- Sums of Squares
- Statistical Procedures
- Accuracy in Parameter Estimation
- Analysis of Covariance (ANCOVA)
- Analysis of Variance (ANOVA)
- Barycentric Discriminant Analysis
- Bivariate Regression
- Bonferroni Procedure
- Bootstrapping
- Canonical Correlation Analysis
- Categorical Data Analysis
- Confirmatory Factor Analysis
- Contrast Analysis
- Descriptive Discriminant Analysis
- Discriminant Analysis
- Dummy Coding
- Effect Coding
- Estimation
- Exploratory Factor Analysis
- Greenhouse-Geisser Correction
- Hierarchical Linear Modeling
- Holm's Sequential Bonferroni Procedure
- Jackknife
- Latent Growth Modeling
- Least Squares, Methods of
- Logistic Regression
- Mean Comparisons
- Missing Data, Imputation of
- Multiple Regression
- Multivariate Analysis of Variance (MANOVA)
- Pairwise Comparisons
- Path Analysis
- Post Hoc Analysis
- Post Hoc Comparisons
- Principal Components Analysis
- Propensity Score Analysis
- Sequential Analysis
- Stepwise Regression
- Structural Equation Modeling
- Survival Analysis
- Trend Analysis
- Yates's Correction
- Statistical Tests
- Bartlett's Test
- Behrens-Fisher t′ Statistic
- Chi-Square Test
- Duncan's Multiple Range Test
- Dunnett's Test
- F Test
- Fisher's Least Significant Difference Test
- Friedman Test
- Honestly Significant Difference (HSD) Test
- Kolmogorov-Smirnov Test
- Kruskal-Wallis Test
- Mann-Whitney U Test
- Mauchly Test
- McNemar's Test
- Multiple Comparison Tests
- Newman-Keuls Test and Tukey Test
- Omnibus Tests
- Scheffé Test
- Sign Test
- t Test, Independent Samples
- t Test, One Sample
- t Test, Paired Samples
- Tukey's Honestly Significant Difference (HSD)
- Welch's t Test
- Wilcoxon Rank Sum Test
- z Test
- Theories, Laws, and Principles
- Bayes's Theorem
- Central Limit Theorem
- Classical Test Theory
- Correspondence Principle
- Critical Theory
- Falsifiability
- Game Theory
- Gauss-Markov Theorem
- Generalizability Theory
- Grounded Theory
- Item Response Theory
- Occam's Razor
- Paradigm
- Positivism
- Probability, Laws of
- Theory
- Theory of Attitude Measurement
- Weber-Fechner Law
- Types of Variables
- Validity of Scores
- Loading...
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches