Interpreting Standardized Test Scores: Strategies for Data-Driven Instructional Decision Making


Craig A. Mertler

  • Citations
  • Add to My List
  • Text Size

  • Chapters
  • Front Matter
  • Back Matter
  • Subject Index
  • Dedication

    For Kate and Addy…

    Thanks for putting up with me while I wrote another one!

    The two of you are my entire world!


    View Copyright Page


    Purpose of the Text

    I have been involved in public PreK-12 education for nearly 20 years. I have been a high school teacher, a supervisor for elementary-level student teachers, a researcher at various levels, a consultant to individual schools as well as school districts, and, currently, a professor of educational research and measurement. I have worked extensively with teachers and district-level administrators, particularly on topics related to both classroom assessment and large-scale assessment. In my work with teachers and administrators, it has become apparent to me that these professionals—the educators for whom scores resulting from standardized tests are so potentially vital and informative—have literally never received formal training regarding how to interpret these scores and, more importantly, how to use them to aid their instructional decision making. They admittedly do not like standardized tests (although who among us really does?). They tend to administer them because their respective states, or the federal government in the case of the No Child Left Behind (NCLB) Act and its associated Adequate Yearly Progress (AYP) requirements, force them to do so. Many teachers are so overwhelmed with the test reports they receive back on their students—I have actually heard teachers comment that “there is so much information here that I don't even know where to begin!”—that they make the conscious decision to do nothing with them, simply filing them away in students’ cumulative folders.

    The basic purpose of this book is to provide teachers and administrators with a manual, of sorts, designed to help them understand the nature of standardized tests and, in particular, the scores that result from them. The ultimate purpose of the book is to help them develop the skills necessary to incorporate these test scores into various types of instructional decision making—a process known as data-driven decision making—necessitated by the needs of their students.

    Audience for Whom the Text is Intended

    This book was written with teachers and administrators as the primary audience. Specifically, this audience includes preservice teachers (seeking initial certification or licensure), K-12 classroom teachers (seeking either master's or doctoral degrees) and K-12 administrators (typically seeking doctoral degrees). I believe that this book is appropriate for educators in all areas of education (e.g., elementary and secondary; mathematics, science, social studies, languages, music, art, physical education, special education). Considering the stress that is being placed on educational accountability in this country, the importance of understanding student performance on standardized tests and knowing how to use that information only continues to grow. This book could serve as a supplement to any course that incorporates standardized testing as a topic, including but not limited to courses in classroom assessment, educational psychology, content methods, reading, special education, curriculum, literacy, administration, the principalship, and the superintendency. In addition to undergraduate and graduate education courses, individual practitioners (e.g., classroom teachers and building or district administrators) seeking professional development opportunities can also gain benefit from this book.

    Organization and Pedagogical Features of the Text

    The treatment of the content is fairly expansive, due to the narrow focus of the book. It does, by necessity, incorporate some measurement-related conceptual information; however, I have tried to make these concepts and subsequent discussions as applied as is possible (e.g., I have tried to avoid the excessive use of statistical formulas!). The coverage of the material is a new presentation of existing knowledge, focusing on the applicability of this knowledge to and by the K-12 professional educator. This is something that has not typically been done when presenting material related to understanding and using standardized test scores. The basic content outline is as follows:

    • Section I: Overview of Standardized Testing: Concepts and Terminology
      • Module 1: What Is “Standardized” Testing?
      • Module 2: The Importance of Standardized Testing
      • Module 3: Standardized Test Administration and Preparation
    • Section II: Standardized Test Scores and Their Interpretations
      • Module 4: Test Reports
      • Module 5: Criterion-Referenced Test Scores and Their Interpretations
      • Module 6: Norm-Referenced Test Scores and Their Interpretations
    • Section III: Using Standardized Test Scores in Instructional Decision Making
      • Module 7: Group-Level Decision Making
      • Module 8: Student-Level Decision Making
      • Module 9: Value-Added Analysis and Interpretation
    • Section IV: Case Studies: Interviews With Teachers and Administrators
    • Interview Transcripts
      • Teachers
      • Administrators

    There are very few textbooks that focus their coverage solely on standardized testing. Most books that do discuss the topic give it a brief chapter, somewhere near the end of the book. Along those lines, most instructors give it brief mention, if any at all. There are four main pedagogical means by which this occurs in the book.

    First, numerous samples of printouts resulting from well-known standardized tests are presented and discussed in detail. This list—which includes achievement, aptitude, and diagnostic tests—is comprised of the following standardized tests:

    • Dynamic Indicators of Basic Early Literacy Skills (DIBELS),
    • Iowa Tests of Basic Skills (ITBS),
    • Gates-MacGinitie Reading Tests (GMRT),
    • Ohio Achievement Tests (third and eighth grades),
    • Otis-Lennon School Ability Test (OLSAT),
    • Stanford Achievement Test 10 (SAT10),
    • TerraNova (2nd ed.), and
    • Wechsler Intelligence Scale for Children IV (WISC-IV)

    Second, following the presentation of the process to be used for incorporating test results into instructional decision making (in Module 3), several specific examples are provided and thoroughly discussed. Highlighted in these discussions are explanations of the purposes of a given test, a description of either (1) a student and her/his scores (including an actual test report) or (2) an entire class and their scores (including a class test report), and an account of how a teacher, or group of teachers, would proceed through the process of using the test scores to aid in making decisions about future instruction.

    Third, each module contains several “Activities for Application and Reflection.” The nature of these activities are quite varied; some are appropriate for seasoned teachers and administrators (by capitalizing on their classroom and other school-based experiences) while others have been designed to address the needs of preservice teachers in helping them understand the process of data-driven instructional decision making.

    Finally, in Section IV, I have presented case studies, consisting of interviews conducted by me with district-level administrators, building administrators, and classroom teachers. These individuals, all from one school district, have been engaged in a process of incorporating test scores into decision making for several years. Several end-of-module activities and discussion starters are tied directly to these interview transcripts.

    A Final Note …

    I honestly do not know anyone who loves standardized testing! But the standardized testing movement is not going away anytime soon. An examination of its impact on this country's educational system over the past 40 years will confirm that. Therefore, I approach it from this perspective … and I strongly suggest that all professional educators adopt a similar attitude. Anytime we are given the responsibility of making decisions about children, we need as much information as possible in order for those decisions to be as accurate as possible. We ask students questions; we ask them to read to us; we require them to write for us; we test them over units of instruction; we observe them; we encourage them to be creative; we engage them in performance-based tasks; etc. The results from standardized tests are just another source of information—about student learning, about our teaching, and about our curriculum. Please use them as such—add them to your long list of various sources of information about student learning. They can only help improve the accuracy of the decisions that we make about our students, as well as our own instruction. Best of luck as you embark on this new, or perhaps not so new, endeavor!


    I would like to acknowledge the contributions of several individuals to this project.

    First, I would like to recognize and sincerely thank my editorial team at Sage Publications, namely Dr. Diane McDaniel (Acquisitions Editor), along with Erica Carroll, Ashley Plummer (Editorial Assistants), and Sarah Quesenberry (Production Editor). After two projects with her, I can definitively attest to the fact that Dr. McDaniel is the most professional editor with whom I have had the pleasure to work, especially in terms of collaboratively developing a project from its initial conception to its ultimate completion.

    I would like to recognize and thank the teachers and administrative staff at Bowling Green City Schools in Bowling Green, Ohio, for allowing me to work with them since 2001 on this concept of utilizing standardized test scores as a contributing source of information for instructional decision making. The completion of this book is just one more step in our continuing journey!

    I would also like to thank those individuals who served as reviewers of both the original prospectus and the initial draft of this book—their comments and feedback were greatly appreciated and extremely helpful:

    • Prospectus Reviewers:
      • Rosemarie L. Ataya, University of South Florida
      • Gordon Brooks, Ohio University
      • Nancy Cerezo, Saint Leo University
      • Marietta Daulton, Walsh University
      • Leland K. Doebler, The University of Montevallo
      • Ramona A. Hall, Cameron University
      • Linda Karges-Bone, Charleston Southern University
    • Manuscript Draft Reviewers:
      • Nancy A. Cerezo, Saint Leo University
      • Ollie Daniels, Barry University
      • Marietta Daulton, Walsh University
      • Jack Dilendik, Moravian College
      • Linda Karges-Bone, Charleston Southern University
      • Terry Hunkapiller Stepka, Arkansas State University
      • And one reviewer who wished to remain anonymous

    Finally, I would like to thank my wife, Kate, and our son, Addy, for their continued support of my extensive writing projects, and for Kate's feedback on various aspects of the book—always keeping me grounded with a classroom teacher's perspective.

    Craig A.Mertler
  • Glossary

    Ability testA type of standardized test used to determine an individual's cognitive ability, such as potential or capacity to learn; often referred to as an aptitude test
    Achievement testA type of standardized test used to measure how much students have learned in specific, clearly defined content areas including, but not limited to, reading, mathematics, science, and social studies
    Age-equivalent scoreA norm-referenced test score that indicates the age in the norm group for which a certain raw score was the median performance
    Aptitude testA type of standardized test used to determine an individual's cognitive ability such as potential or capacity to learn; sometimes referred to as an ability test
    Confidence intervalA range of scores within which we are reasonably confident; includes the student's true ability or achievement score
    Constructed-response test itemsTest item where students must recall from their own memories, or otherwise create, their responses
    Criterion-referenced test scoresTest scores that compare a student's performance to some preestablished criteria or objectives
    Cross-sectional analysisThe practice of comparing one cohort of students (i.e., this year's class of students, or perhaps an entire grade level) to another cohort (i.e., last year's class or grade level); this usually happens across different school years
    Cut scoresTest score values that serve as the cutoff points between adjacent categories along some performance continuum
    Data-driven instructional decision makingA process by which educators examine the results of standardized tests to identify student strengths and deficiencies
    Derived scoresNew score scales that result from transforming raw scores to know how a particular student's raw score compares to the specific norm group; also known as transformed scores
    Deviation IQ scoreA type of normalized standard score that provides the location of a raw score in a normal distribution having a mean of 100 and a standard deviation equal to 15 or 16 (depending on the specific test)
    Diagnostic testA specialized version of an achievement test used to identify the specific areas of weaknesses the student may be encountering
    Difficulty indexA value equal to the proportion of students who answer a particular test item correctly
    Grade-equivalent scoreA norm-referenced test score that indicates the grade in the norm group for which a certain raw score was the median performance and is intended to estimate a student's developmental level
    Group biasA type of test bias that occurs when a test contains information or words that favor one racial, ethnic, or gender group over another
    High-stakes testsStandardized tests whose results can have substantial consequences for students, teachers, and schools
    Item discriminationA measure of how well students who scored high on the entire test perform on an individual item as compared to the performance on that item by students who scored low on the entire test
    Linear standard scoreA type of norm-referenced score that tells how far a raw score is located from the mean of the norm group, with the distance being expressed in standard deviation units
    Longitudinal analysisThe practice of tracking individual student and cohort performances along multiyear routes, focusing on academic gains made over time for the same students
    National percentile bandConfidence intervals that are presented around a student's obtained percentile rank scores
    Norm groupThe national sample of students that serves as the basis for the comparison for the scores attained by a given local group of students on a norm-referenced standardized test
    Normal curve equivalent scoreA type of normalized standard score that has a mean of 50 and a standard deviation of 21.06
    Normal distributionA distribution of test scores that serves as the basis for transformed scores; also known as a normal curve or a bell-shaped curve
    Normalized standard scoresA type of norm-referenced score where the raw score has been transformed in order to obtain the same area beneath a “curve” representing the distribution of scores as is found in a normal distribution
    Norm-referenced test scoresTest scores that compare individual student scores to the performance of other similar students
    Percentile rankA norm-referenced test score that indicates the percentage of the norm group that scored below a given raw score
    Precision of performance scoreA criterion-referenced test score that involves measuring the degree of accuracy with which a student completes a task
    Quality of performance scoreA criterion-referenced test score that consists of ratings that indicate the level at which a student performs
    Raw scoreA criterion-referenced test score, typically presented as the number or percentage of items answered correctly
    ReliabilityThe degree to which the scores on a given test are consistent
    Sample biasA type of test bias that occurs when certain cultural groups do not have adequate representation in the norm group (the group to which student test performance will ultimately be compared)
    SAT/GRE scoresA type of normalized standard score that is reported on a scale that has a mean of 500 and a standard deviation of 100
    Selected-response itemsTest items that have only one correct answer and that correct answer actually appears as part of the question; the student's task is to simply identify, or select, the correct option
    Speed of performance scoreA criterion-referenced test score reported as the amount of time it takes for a student to complete a task or the number of tasks a student can complete in a fixed amount of time
    Standard error of measurement (SEM)The average amount of measurement error across students in the norm group; also known simply as standard error
    Standard settingThe process used to establish various cut scores
    Standardized scoresA category of norm-referenced test scores that are obtained when raw scores are transformed to fit a distribution whose characteristics are known and fixed, usually a normal distribution
    Standardized testAny test that is administered, scored, and interpreted in a standard, consistent manner
    StanineA type of normalized standard score that provides the location of a raw score in a specific segment of the normal distribution; they range in value from 1 to 9, where the mean is equal to 5 and the standard deviation is equal to 2
    State-mandated testsStandardized tests that are typically developed and implemented to meet some sort of legislative mandate within a particular state and have been implemented for accountability purposes
    Test biasA situation that occurs if and when a standardized test is in some fashion unfair to one or more minority groups
    Test normsSpecific descriptions of how a representative national sample of students (i.e., a norm group) performed on an actual final test
    Testwiseness skillsStudent abilities in the use of test taking strategies during a particular standardized test
    Transformed scoresNew score scales that result from transforming raw scores in order to know how a particular student's raw score compares to the specific norm group; also known as derived scores
    T-scoreA type of linear standard score that provides the location of a raw score in a distribution that has a mean of 50 and a standard deviation of 10
    ValidityThe extent to which a test—and more specifically, the resulting information it provides about a given student—is sufficient and appropriate to make various educational decisions for which the information is intended
    Value-added analysisA newer method of measuring teaching and learning that analyzes annual test scores to reveal the progress students are making each year at both the individual and group levels
    Z-scoreA type of linear standard score that exists on a continuum that has a mean of zero and a standard deviation of 1


    Airasian, P. W. (2005). Classroom assessment: Concepts and applications (
    5th ed.
    ). Boston: McGraw-Hill.
    Borich, G. D., & Tombari, M. L. (2004). Educational assessment for the elementary and middle school classroom (
    2nd ed.
    ). Upper Saddle River, NJ: Merrill/Prentice Hall.
    Chase, C. I. (1999). Contemporary assessment for educators. Boston: Allyn & Bacon.
    Cizek, G J. (1996). Setting passing scores: An NCME instructional module. Educational Measurement: Issues and Practice, 15(2), 20–31.
    Crocker, L., & Algina, A. (1986). Introduction to classical and modern test theory. Fort Worth, TX: Harcourt Brace Jovanovich.
    CTB/McGraw-Hill. (2001). TerraNova: Technical Quality (
    2nd ed.
    ). Retrieved March 10, 2006,
    Essex, N. L. (2006). What every teacher should know about No Child Left Behind: Allyn & Bacon Start Smart Series. Boston: Allyn & Bacon.
    Evergreen Freedom Foundation. (2002). School directors' handbook—Student assessments. Olympia, WA: Author. Retrieved on July 25, 2006, from
    Evergreen Freedom Foundation. (2003). School directors' handbook—Value-added assessments. Olympia, WA: Author. Retrieved on July 25, 2006, from
    Gredler, M. E. (1999). Classroom assessment and learning. Boston: Allyn & Bacon.
    Gronlund, N. E. (2006). Assessment of student achievement (
    8th ed.
    ). Boston: Allyn & Bacon.
    Hamilton, L. S., & Koretz, D. M. (2002). Tests and their use in test-based accountability systems. In L. S.Hamilton, B. M.Stecher, & S. P.Klein (Eds.), Making sense of test-based accountability in education (pp. 13–49). Santa Monica, CA: RAND.
    Harcourt Assessment. (2002). Metropolitan8—Technical Manual. San Antonio, TX: Author.
    Hershberg, T., Simon, V. A., & Lea-Kruger, B. (2004). The revelations of value-added. School Administrator, 61(11), 10–12.
    Hogan, T. P. (2007). Educational assessment: A practical approach. Hoboken, NJ: John Wiley & Sons.
    Kober, N. (2002). Teaching to the test: The good, the bad, and who's responsible (TestTalk for Leaders. No. 1). Washington, DC: Center on Education Policy.
    Kubiszyn, T., & Borich, G. (2007). Educational testing and measurement: Classroom application and practice (
    8th ed.
    ). Hoboken, NJ: John Wiley & Sons.
    LaFee, S. (2002). Data-driven districts. School Administrator, 59(11), 6–7, 9–10, 12, 14–15.
    Linn, R. L., & Miller, M. D. (2005). Measurement and assessment in teaching (
    9th ed.
    ). Upper Saddle River, NJ: Merrill/Prentice Hall.
    Mahoney, J. W. (2004). Why add value in assessment?School Administrator, 61(11), 16–18.
    McMillan, J. H. (2001). Essential assessment concepts for teachers and administrators. In T. R.Guskey & R. J.Marzano (Series Eds.), Experts in Assessment. Thousand Oaks, CA: Corwin Press.
    McMillan, J. H. (2004). Classroom assessment: Principles and practice for effective instruction (
    3rd ed.
    ). Boston: Allyn & Bacon.
    Mertler, C. A. (2002). Using standardized test data to guide instruction and intervention. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation Digest Series, EDO-TM-07.
    Mertler, C. A. (2003). Classroom assessment: A practical guide for educators. Los Angeles: Pyrczak.
    Mertler, C. A. (2006, October). Teachers' perceptions of the influences of No Child Left Behind on instructional and assessment practices. Paper presented at the annual meeting of the Mid-Western Educational Research Association, Columbus, OH.
    Mertler, C. A., & Zachel, K. (2006). Data-driven instructional decision making: An idea (and practice) whose time has come. Principal Navigator, 1(3), 6–9.
    Miyasaka, J. R. (2000, April). A framework for evaluating the validity of test preparation practices. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
    Monson, R. J. (2002). Using data to think differently. School Administrator, 59(11), 24–25, 27–28.
    National Center for Educational Statistics (NCES). (2006). The Nation's Report Card. Retrieved March 8, 2006, from
    Nitko, A. J. (2004). Educational assessment of students (
    4th ed.
    ). Upper Saddle River, NJ: Merrill/Prentice Hall.
    Ohio Department of Education. (2004). 2004–05 Ohio school district rating definitions. Retrieved July 15, 2005, from
    Ohio Department of Education. (2006). Statistical summary of the Ohio Graduation Tests: March 2005 administration. Columbus, OH: Author.
    Oosterhof, A. (2001). Classroom applications of educational measurement (
    3rd ed.
    ). Upper Saddle River, NJ: Merrill/Prentice Hall.
    Payne, D. A. (2003). Applied educational measurement (
    2nd ed.
    ). Belmont, CA: Wadsworth.
    Popham, W. J. (2002). Classroom assessment: What teachers need to know (
    3rd ed.
    ). Boston: Allyn & Bacon.
    Sanders, W L. (2003, April). Beyond No Child Left Behind. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
    Spinelli, C. G (2006). Classroom assessment for students in special and general education (
    2nd ed.
    ). Upper Saddle River, NJ: Merrill/Prentice Hall.
    Tanner, D. E. (2001). Assessing academic achievement. Boston: Allyn & Bacon.
    Thorndike, R. M. (2005). Measurement and evaluation in psychology and education (
    7th ed.
    ). Upper Saddle River, NJ: Merrill/Prentice Hall.

    About the Author

    Dr. Craig A. Mertler is currently a Professor of assessment and research methodologies in the College of Education and Human Development at Bowling Green State University, Ohio. Dr. Mertler teaches graduate courses in quantitative research methods, introductory statistical analyses, multivariate statistical analyses, classroom assessment, and standardized test interpretation. He also teaches undergraduate courses in educational assessment methods. He is currently the author of five books (including Action Research: Teachers as Researchers in the Classroom, 2006), two invited book chapters, 14 refereed journal articles, two instructors’ manuals, and numerous nonrefereed articles and manuscripts. He has also presented numerous research papers at professional meetings around the country as well as internationally. Dr. Mertler conducts workshops for both preservice and inservice teachers on the broad topic of classroom assessment—and specifically on interpreting standardized test scores—as well as on classroom-based action research. His primary research interests include classroom teachers’ assessment literacy, assessment practices of classroom teachers, and Web-based survey methodology. Prior to teaching and researching at the university level, Dr. Mertler taught high school biology and Earth science, coached track and volleyball, and also advised various student groups.

    • Loading...
Back to Top