A Local Assessment Toolkit to Promote Deeper Learning: Transforming Research Into Practice


Karin Hess

  • Citations
  • Add to My List
  • Text Size

  • Chapters
  • Front Matter
  • Back Matter
  • Subject Index
  • Copyright


    by Jay McTighe,

    Co-author of Understanding by Design

    The title of this book is revealing. While its pages offer a rich collection of tools and strategies for effective classroom assessment, its reference to deeper learning reminds us that our assessments should serve a larger purpose than simply measurement.

    The book would be worthwhile if it simply focused on summative assessments for evaluation and grading. Dr. Hess articulates the principles and technical requirements of sound assessment with the authority and assurance of a scholar to ensure that evaluative assessments and concomitant grades provide fair and valid measures of targeted goals. But the book goes much further in examining the benefits of assessment (and related instructional) practices that promote deeper learning.

    To underscore the primacy of learning-focused assessment, the book begins with an exploration of models of learning by Kolb and McCarthy. Hess then “walks her talk” by employing these models overtly thoughout the book. The book is structured to support both individual readers and group reading (e.g., via professional learning communities or study groups). Additionally, she offers suggestions for using the book as a guide in workshops, and the modular structure offers flexibility to readers and leaders alike.

    Before launching into assessment-specific tools and strategies, Hess describes her conception of intellectual rigor and introduces the Cognitive Rigor Matrix (CRM), an innovative framework for determining the level of complexity of assessment items and tasks. The Matrix can be used to analyze current assessments, serve as a design tool for constructing more rigorous performance tasks, and guide differentiated instruction. For leaders, the CRM can also be used as a framework for classroom observations and “walk-throughs.”

    The subsequent section on text complexity is particularly rich, offering a unique window into large-scale assessment design. Hess then translates these technical processes into state-of-the-art tools and protocols that classroom teachers can apply to determine the text complexity of their assignments and assessments.

    The heart of the book centers on the design of high-quality performance assessment tasks and associated rubrics. Authentic tasks engage students in applying their knowledge and skills to a realistic issue or challenge, providing evidence of their understanding. Like the game in athletics or the play in theater, quality performance tasks provide clear and relevant targets for learning that can engage and motivate learning, as well as measure it. Hess’s many years of experience are on full display as she offers an array of tools and suggested practices for the design and use of rigorous performance assessments.

    Another unique contribution of the book reflects Dr. Hess’s groundbreaking work of identifying Learning Progressions (LPs) in various disciplines. LPs provide descriptive continuums that chart the learning pathways from novice to expert performance. She illustrates the value of their use as informative guides to planning instruction and assessment based on a logical progression of how learners migrate from basic skills and concepts toward deeper understanding and transfer abilities.

    Moving beyond the qualities of individual assessments, Hess makes a strong case for the importance of developing a comprehensive and balanced assessment system. As in a patchwork quilt, the whole is greater than the sum of the parts. An assessment system is needed to assess all the outcomes that matter and provide the feedback necessary for systemic, continuous improvement.

    The book concludes with an Appendix containing a comprehensive list of all the tools referenced throughout the book. As with a large collection of mechanic’s tools, you are not likely to use all of the assets contained in this full-featured toolkit. Throughout each module are tools for teachers and students. Additionally, the appendices contain numerous models and tips to support development of assessments and rubrics. Over time, as your assessment practices expand, you will find a treasure trove of practical resources to promote aligned assessment, deep learning, and continuous improvement. Be prepared to have your own learning deepened as you devour this impressive volume. Most certainly your students will be the beneficiaries.


    They say it takes a village to raise a child. I think sometimes that can also be true for when writing a book. Throughout my career, I’ve benefited tremendously from the insights and good will of many people in “my village”—picking up gems of wisdom along the way from my own children and grandchildren, from students in my classroom, from many incredible educational gurus, and from my work with colleagues and school leaders who trusted me to lead longer-term efforts in their states or schools. This book represents more than my vision alone. It attempts to provide some collective thinking about how learning occurs and the many ways that local assessment practices can be implemented across districts, so that in every classroom, assessment is employed to enhance student learning, confidence, and independence.

    I’ve always believed that learning directly from the best thinkers in education would make my ability to translate their ideas into practice much stronger. While too numerous to mention all of them (trust me, it’s a long list), there are several who strongly influenced and advanced my thinking to the “next level” when attending their workshops, reading their books and articles, and having thought-provoking conversations with them. I hope I have remained true to their original thinking while striving to integrate and advance their ideas into my work.

    Two thought leaders helped me refocus my thinking about best practices in teaching when I was a middle school teacher. After reading Nancy Atwell’s classic book, In the Middle, and taking a course with Sandra Kaplan, “Differentiating Curriculum for Gifted Learners,” I was convinced that differentiation, student-centered learning, and targeted small group instruction could be much more effective in reaching all learners, not just the gifted students. That first course with Sandra Kaplan eventually led me to complete my master of science degree in the education of the gifted.

    Later, while working at the New Jersey Department of Education, several more national experts rocked my world—among the best were Grant Wiggins, Jay McTighe, and Heidi Hayes-Jacobs.

    Before there was the dynamic duo of Wiggins and McTighe, there was Wiggins and there was McTighe. Grant was talking about this wacky idea he called authentic assessment. At one of his institutes, he asked us, “What if during the last two weeks of school, you let students identify something they wanted to learn and gave them access to the whole school building and all of the teachers as resources to complete their independent studies?” I raised my hand and volunteered to be an observer at one of those elementary schools to see it firsthand. Grant always helped me to see the bigger picture—a vision of where assessment could take you and your students.

    What I learned from Jay was how to design teacher-friendly tools to actually do this work. Call me a pack rat, but I still have that folder he mailed to me years ago with several practical planning templates for developing performance tasks. In large part, I envisioned my book being modeled after the Understanding by Design Professional Development Workbook—clear, practical, and revolutionary in its potential impact on teaching, learning, and assessment.

    Most people think of curriculum mapping or 21st century skills when they think of Heidi Hayes Jacobs; but that was not what first moved the needle for me. Long before those books, Heidi stood in a school auditorium in Newark, New Jersey, providing a visual metaphor for typical classroom discourse. She tossed a volley ball into the audience to simulate a teacher posing a question. The ball came back to her, simulating a response. After several of these exchanges, she asked, “Who touched the ball the most?” The answer was obvious—the teacher was the most engaged person in that classroom!

    From that day forward, I began to attend more to how to structure classroom discourse for deeper thinking and greater student engagement. It should be no surprise then that many years later, when I read Norman Webb’s reports and papers about test alignment studies and discovered his thinking about four depth-of-knowledge levels (DOK) when students engage with content, that I made one more important leap forward. It began with designing tools for teacher committees to use when conducting alignment studies of state assessments and led to thinking more about tests designed for deeper thinking and engagement. I further refined my work when integrating the concepts of depth of knowledge with formative uses of assessment. Let me just say that I’ve learned much from Dylan Wiliam’s research in the area of formative assessment.

    Collaborations on long-term curriculum and assessment projects while working with colleagues at the Center for Assessment (Dover, NH), the New Hampshire Department of Education, the New York City Department of Education, and the Center for Collaborative Education (Boston) provided me with the “big picture” in finding ways to bridge local assessment with large-scale assessment practices.

    Multiyear projects with school leaders and school districts challenged me to develop, rethink, and refine the ideas and materials included in this toolkit. These critical friends always help me to “keep it real” for teachers and kids.

    • In Vermont—working with Sue Biggam (Vermont Reads Institute), Bob Stanton (Lamoille Area Professional Development Academy/LAPDA), Jen Miller-Arsenault (Washington Central Supervisory Union), and Michaela Martin (Orleans North Supervisory Union)
    • In Connecticut—working with literacy consultants at EASTCONN, Donna Drasch and Helen Weingart
    • In Wyoming—working with R. J. Kost (Park County School District #1) and Kelly Hornsby (Campbell County School District #1)
    • In Arkansas—working with Megan Slocum and Marcia Smith (Springdale School District)
    • In Oregon—having a unique multiyear email relationship with Susan Richmond (Hillsboro School District) as she worked to bring these ideas to teachers in her district and in the process, pushed a lot of my thinking about implementation challenges.

    Finally, a special note of thanks for the people at Corwin who always provided helpful direction, insightful comments, and unlimited patience, making me feel as though this book must be the only book being worked on! I’m especially grateful for the support and responsiveness of Jessica Allan, Diane DiMura, Lucas Schleicher, Mia Rodriguez, and Tori Mirsadjadi, who knew exactly how to take my ideas and fashion them into the final “package.”

    About the Author

    Image 3

    Karin Hess has more than 40 years of deep experience in curriculum, instruction, and assessment. She is a recognized international leader in developing practical approaches for using cognitive rigor and learning progressions as the foundation for formative, interim, and performance assessments at all levels of assessment systems. For almost 15 years at the Center for Assessment, Dr. Hess distinguished herself as a content, assessment, and alignment expert in multiple content areas, K–12. She has effectively guided many states and U.S. territories in the development of grade-level standards and test specifications for general education (e.g., New England Common Assessment Program/NECAP; Smarter Balanced Assessment Consortium/SBAC) and alternate assessments for students with significant cognitive disabilities (e.g., National Center and State Collaborative/NCSC). During this time, she also contributed to Maine’s early thinking about how to structure requirements for graduation exhibitions and provided in-depth guidance for local development and use of performance assessments for proficiency-based graduation systems in Rhode Island, Wyoming, and New Hampshire. Dr. Hess’s experiences as a state director of gifted education for New Jersey and as a district curriculum director, building principal, and classroom teacher (15 years) enable her to understand the practical implications of her work while maintaining fidelity to research, technical quality, and established best practices. Dr. Hess has also worked as a program evaluator for the Vermont Mathematics Project and as the developer and editor of Science Exemplars (www.exemplars.com), creating, piloting, and annotating student work samples in science. Karin has authored and co-authored numerous books, book chapters, articles, and white papers related to cognitive rigor, text complexity, assessment, and student-centered learning. Her ongoing work has included guiding the development and implementation of New Hampshire’s K–12 Model Competencies for ELA, Mathematics, and Science and supporting school districts in many states in creating and analyzing use of high-quality performance assessments and performance scales for competency-based learning systems.


    College- and career-readiness (CCR) standards set expectations for all students to demonstrate deep conceptual understanding through the application of content knowledge and skills in new situations. Unfortunately, content standards provide limited or no guidance as to how, when, or to what degree specific skills and concepts should be emphasized by educators in the classroom. Without a clear direction and use of rich, engaging learning tasks, important CCR skills and dispositions will be, at best, inconsistently or randomly addressed by teachers or forgotten in the design of systemwide programs, curriculum, and instruction. We know that what gets tested is what gets the greatest instructional attention. If assessments of CCR standards only test acquisition and basic application of academic skills and concepts, there will be little incentive for schools to focus instruction and local assessment on deeper understanding and transfer of learning to real-world contexts (Hess & Gong, 2014). And, if deeper understanding and transfer of learning is only an expectation for some students, then we have not done our job in preparing all students for life after high school. Knowing how to think deeply about content and about themselves as learners should be a CCR goal for every student.

    From my work with teachers, school districts, and states across the country, I’ve come to understand that effective local assessment systems are built upon several critical components:

    • Ensuring there is a range of assessments (formative, interim, summative) of high technical quality at every grade level
    • Providing robust, embedded professional development and leadership to support the use of assessment data that informs instruction and advances learning for each student
    • Promoting a belief system that at the center of assessment is helping each student reach full potential

    The tools, protocols, and examples presented in the Local Assessment Toolkit follow three guiding principles with the overarching goal of deeper learning for all students.

    Guiding Principle #1: Assessment Quality Matters

    Every day in every classroom, we set expectations for learning and in some way attempt to assess that learning. We buy tests; we buy curricular materials with tests; we develop tests; we spend class time on “test prep” prior to administering large-scale tests; and we worry a lot about test results. Sometimes, we design performance tasks and extended projects to elicit learning at a deeper level than most commercially available assessments can tap. We probably believe that most assessments we use have a higher level of precision and technical quality than they actually do. (I know that I did when I was in the classroom.) In other words, we put a lot of faith in the quality of the assessments we’re using, even when we’re not sure what they specifically assess; whether they do it sufficiently, effectively, and fairly; or how best to use the results to truly advance and deepen learning.

    Individual educators and sometimes groups of educators use data from assessments to assign grades, identify and address learning gaps, or to plan next steps for instruction. But when do we actually examine the technical quality of our local assessments? And when does a “collection of tests” across teachers, grade levels, and content areas evolve to become a high-quality, cohesive assessment system? Knowing why an assessment is or is not a high-quality assessment is the first step in knowing how to interpret and use the results—even if this means replacing some of your current assessments with more effective ones. Often when I begin working with assessment teams in a school district, giving them honest feedback on their current use of assessments, it’s not unusual for me to hear these comments: “I wish you were here 3 years ago when we began this work!” or “You’re making us crazy . . . but in a good way.”

    Why do they say these things? Because assessment quality really does matter.

    Guiding Principle #2: Learning Is at the Center of Assessment Design and System Coherence

    I spent more than 25 years working in schools before I began working full time with schools. Schools are busy places. I get it. There is little time to stop and evaluate the quality of each assessment or to examine how well the overall assessment system supports our collective beliefs about student learning or promotes a vision for how students might engage more deeply while they are learning. It’s not enough to buy or be handed “good assessments” without understanding what makes them high-quality assessments. And it’s not enough to simply assess the skills and concepts that are most easily assessed, while ignoring the importance of deeper understanding.

    An assessment system and the assessments in it should align with the philosophy underlying the Assessment Triangle, first presented in Knowing What Students Know (National Research Council [NRC], 2001). The assessment triangle focuses on the triad of cognition, observation, and interpretation, as the precursors to the design and use of assessments. In other words, how students typically learn and develop expertise in a content domain should guide assessment design and use of assessment data. Based on my understanding and interpretation of the Assessment Triangle, this is how cognition, observation, and interpretation interact to drive assessment design:

    • COGNITION: Presents a (cognitive) model of knowing how students learn and develop competence in a subject domain over time
    • OBSERVATION: Guides development of the kind of tasks or situations that allow one to observe student learning and performance along a learning continuum
    • INTERPRETATION: Offers a method of knowing how to draw inferences about learning from the performance evidence
    Guiding Principle #3: Deep Learning Is an Essential Goal for Every Student

    Most students have become accustomed to “learning” as listening to the teacher, reading texts, and practicing what the teacher showed them how to do. Maybe we haven’t systemically changed how we teach and assess because teaching for deeper understanding and assessing it is really, really complicated. Deeper learning to develop expertise in a domain of knowledge and performance requires months, or even years, of sustained, deliberate practice. Development of expertise also requires feedback to guide and optimize practice activities. A student with strong interpersonal skills will best understand and apply such feedback to academic learning.

    Image 4
    Design Questions Related to the Elements of the Assessment Triangle

    Source: National Research Council (2001).

    Education for Life and Work (NRC, 2012) provides a cognitive science perspective of deeper learning showing how different it is from a more traditional approach.

    We define “deeper learning” as the process through which an individual becomes capable of taking what was learned in one situation and applying it to new situations (also called transfer). Through deeper learning (which often involves shared learning and interactions with others in a community), the individual develops expertise in a particular domain of knowledge and/or performance. . . . While other types of learning may allow an individual to recall facts, concepts, or procedures, deeper learning allows the individual to transfer what was learned to solve new problems. (pp. 5–6)

    Deeper learning is supported through rich instructional and assessment practices that create a positive, collaborative learning environment in which students gain content knowledge while developing their intrapersonal and interpersonal skills. For example, developing metacognitive skills—the ability to reflect on one’s own learning and make strategic adjustments accordingly—deepens academic content learning as well.

    When we value deeper learning, we provide the time and strategic supports necessary for all students to be successful. This shift requires that we change how we view the learner, the curriculum, and the learning environment. When we value deeper learning across our school systems, we provide all teachers with the time and tools to collaborate and to deepen their understanding of quality assessment and how assessment results can help us to meet the needs of diverse learners. The Local Assessment Toolkit has taken the NRC research and recommendations to heart and put them into user-friendly practice. Using many of these tools and protocols, I’ve seen teams of teachers dramatically shift perspectives about learning and daily practices in instruction and assessment that can eventually shift the schoolwide norms.

    Do you believe that deeper learning is an essential goal for all students and that all students can learn to think deeply? If you do, then the strategies and tools in this book should greatly enhance your work and move it forward. If you don’t believe it’s possible for all students to understand in more meaningful ways, then you can either give the book to a colleague (just kidding) or try one strategy to see if the results change your mind, even if only a little bit at first. This is usually the way change in daily practice begins—one strategy at a time. Once you and your students see that they can produce high-quality work, learning in your classrooms will never be the same. Students will come to know they are capable of that kind of work all the time. They will not be afraid to take on more challenging and complex assignments because they know they will be supported in the struggle. Deeper learning—not grades—becomes the motivation to do better, do more, and go farther.

    This last idea was confirmed for me many years ago, both in a workshop with Rick Stiggins when he was discussing research about what motivates learners to want to improve their performance and in his 1997 book, where he states, “If students are to come to believe in themselves, then they must first experience some believable (credible) form of academic success as reflected in a real and rigorous assessment. A small success can rekindle a small spark of confidence, which in turn encourages more trying. . . . Our goal, then is to perpetuate this cycle” (p. 45). During the workshop I attended, he identified the top three motivators for students improving their performance. Here they are, in reverse order:

    • Number 3 Motivator—When a supportive adult who knows the student well provides appropriate scaffolding, challenge, and encouragement, students come to understand that learning is about being confused and then figuring it out. (Readers of this book fall into this category. Congratulations! You are the third most effective motivator for your students.)
    • Number 2 Motivator—When a student identifies with another student who is successful at a similar task, he believes that he can also be successful. (“Hey, I’m as smart as she is. I can do that, too!”) This will only happen when students play on the same team, when they work and struggle together, and when they know that success is the result of everyone contributing to produce high-quality work and deeper understanding. This is the power of collaboration and is true for both students and teachers.
    • Number 1 Motivator—When a student can see her own progress over time, it validates that she can learn deeply. This means that the evidence of learning needs to be specific and concrete. A student needs to understand the success criteria for the learning target, match it to her performance to meet the target, and be able to identify what went well and where more work is needed. This is about students having some control of their learning pathway and knowing what and how to learn. This is the power of self-assessment, and it holds true for both students and teachers.
    The Design, Purpose, and Suggested Uses of the Local Assessment Toolkit

    The Local Assessment Toolkit represents many years working with educators to develop and refine practical tools and protocols that build local capacity by establishing a common language and deeper knowledge of assessment system components and how the system as a whole can become greater than its parts. Because assessment systems include formative uses of assessment, I see the “system” as including instruction and the many related instructional supports. For example, I cannot talk about increasing rigor without suggesting ways to support (scaffold) all students in getting there (e.g., providing strategies for classroom discourse that uncovers thinking or graphic organizers that help students to organize ideas). I cannot talk about designing assessments for and of deeper thinking without providing strategies for interpreting and acting on evidence seen in student work. You cannot do one well without the other.

    As the title suggests, the design of assessments and the assessment system should always be based on linking research with day-to-day practices that promote authentic learning and deeper understanding. While these tools can be employed to support any high-quality assessment work, my focus is on knowing how the brain processes information so that thinking can be revealed and understanding deepened. I’ve linked the tools, protocols, and strategies throughout the toolkit with several areas of research: cognitive science, psychometrics and assessment design, research related to building expertise in the academic disciplines, and interpersonal and intrapersonal skill development.

    These tools have had many users—from undergraduate- and graduate-level classes, to in-service professional development activities, to guidance for test item writers and designers of research studies. However, the Local Assessment Toolkit was primarily designed as a professional development guide for long-term use by school leaders. The most effective leadership cadre is composed of committed administrators, district curriculum and assessment facilitators, instructional coaches, and teacher leaders. At the end of the day, the outside consultant gets on a plane and goes home. Sometimes the consultant comes back several times or acts as a long-distance mentor to the school leaders; but the people who carry the vision forward and ensure fidelity of the day-to-day implementation are the school leaders. They are the “front runners” who will try things out first in their classrooms and inspire their colleagues by sharing and reflecting on their results. They are the coaches and supervisors who will visit classrooms, looking for and supporting deeper-learning teaching and assessment practices. They are the administrators who can structure time and focus, making collaboration possible for designing and using high-quality assessments. Long-term implementation is about cultivating leadership density—recognizing the potential for leadership within your staff, and weaving connections among teams within the system, so that teachers can acknowledge their own leadership in shaping the vision for the direction in which the school or district is heading. Building leadership density and providing structures for meaningful collaboration are two critical components needed for implementing systemic change (Hess, 2000). Based on my research, a third component necessary for ensuring systemic change is involving every educator and every student in the work.

    About the Modules

    The five topic-based modules guide assessment teams and professional developers in the use of field-tested, teacher-friendly tools, strategies, and protocols similar to what testing contractors might use in test development, but modified for a local school setting. Each module applies best practices in assessment design and use to what we know about student thinking and understanding. Above all, this assessment toolkit is about learning.

    The modules do not have to be used sequentially; however, earlier modules focus on components essential to instruction and assessment design (rigor, alignment, student engagement)—a good starting point. The later modules focus on assessment system development and supporting district or schoolwide use. Most assessment teams that I have worked with find that as they work collaboratively, taking the time to process and test the ideas and examples across classrooms (such as collecting and collaboratively analyzing student work samples), their assessment literacy deepens. Modules are often revisited from time to time to gain new insights about student learning or used for professional learning when new staff are brought on board. Never forget that this work is ongoing and always cyclical and iterative in nature. Here are a few key points about the organization of the Local Assessment Toolkit:

    • Modules have been organized in what I hope is a thoughtful approach to helping school and district leadership teams analyze and perhaps redesign local assessment systems for greater effectiveness and deeper learning.
    • Modules are designed to be used both for individual learning and to frame a series of professional development opportunities, such as local PLC activities. Collaborative learning and implementation will be the most effective way to change the system.
    • I always begin with Module 1 (cognitive rigor) and then decide where to focus next. As you unpack the complexities of each module, you’ll realize that slowing down implementation allows time for the system to catch up. While some schools may begin with a focus on questioning and classroom discourse, others might focus on developing and using performance assessments. I always recommend setting two goals: a short-term goal that will yield immediate results (e.g., every teacher tries one new strategy in Module 1 to reflect on and share with peers) and a longer-term goal for the school (e.g., school-based validation teams begin to codevelop assessments using Module 3 tools).
    • Each module is framed by an essential question and divided into two parts. Part 1 provides a discussion and examples of the “what” (defining key learning and assessment ideas) and the “why” (a research-based rationale for implementation). Part 2 can be thought of as the “how” (how to practice the ideas) and the “what if” (suggesting ways to apply and adapt the ideas to improve your local system).
    • Part 2 of each module also includes a suggested workshop plan using the support materials and resources in the module. The 4-stage structure of the suggested professional learning plan synthesizes Kolb’s Experiential Learning Cycle with McCarthy’s 4MAT learning style system to address the why, the what, the how, and the what if of implementation. A brief description of Kolb and McCarthy’s models and my rationale for using them to plan professional development sessions are provided at the end of this section.
    • Throughout each module, there are a variety of support materials. Icons are used to indicate specific uses of the information and protocols, and to differentiate teacher versus student tools.


    Support Materials in Each Module


    When you see the clipboard, it indicates a potential workshop activity or time for readers to stop and reflect.


    Teacher tools are indicated by a compass icon with a number. Teacher tools include a variety of instructional and assessment planning worksheets and protocols. Tools in each module correspond to a different focus (e.g., Module 1—Cognitive Rigor; Module 2—Text Complexity; Module 3—PLC Tools).


    Kid tools are classroom examples to be used by students to uncover thinking and reasoning. They include everything from graphic organizers to student peer- and self-assessment tools.


    Throughout the toolkit, video clips and other professional resources are suggested to further illustrate a concept or demonstrate a protocol. These are resources that school leaders might want to include in locally designed professional learning or PLC activities.

    Overview of Module Content
    Module 1: (Cognitive Rigor) Are My Students Thinking Deeply or Just Working Harder?
    Infusing rigor into instruction and assessment: Laying the groundwork for deeper learning for all students

    Module 1 lays the groundwork for developing a common understanding of cognitive rigor and dispels seven common misconceptions about DOK/rigor. Activities, examples, and tools illustrate how to use the Hess Cognitive Rigor Matrices when planning and supporting or coaching instructional and assessment activities.

    Module 2: (Text Complexity) Is the Task Appropriate to the Text?
    Examining and using increasingly complex texts

    Module 2 provides tools for understanding and qualitatively analyzing literary and informational texts before deciding how to best use them. Instructional planning tools and frames for text-based questions are used to explore ways to probe for deeper understanding of print and nonprint texts.

    Module 3: (Assessment Design and Use) What Does This Test Really Measure?
    Designing and refining high-quality assessments for deeper learning

    Module 3 includes numerous tools, protocols, and content-specific examples for designing assessments for formative, interim, or summative use. PLC tools apply research-based technical indicators for rubric design, task validation, and development of common performance assessments. Processes for analyzing student work and selecting and annotating anchor papers are used to build understandings about student learning.

    Module 4: (Learning Progressions) Where Do I Start, What Do I Teach Next, Which Supports Work Best?
    Using learning progressions as a schema for planning instruction and measuring progress

    Module 4 can be thought of as a course called learning progressions 101. Information in this module is designed to help educators clarify what a learning progression is and what it is not. For example, your best guess about how learning might develop is not a learning progression; arranging standards across grade levels is not a learning progression. True learning progressions are supported by targeted instruction and validated with empirical research and student work analysis. The step-by-step strategies and school-based examples in this module provide guidance in using learning progressions to plan and target instruction, creating pre-, mid-, and postassessments, and measuring progress along a learning continuum.

    Module 5: (Building a Comprehensive Assessment System) Is This a Collection of Tests or an Assessment System?
    Building and sustaining a comprehensive local assessment system for deeper learning

    Modules 1 through 4 have focused mostly on individual assessment design and use, not on building a coherent assessment system. The tools in Module 5 are useful in identifying system strengths and gaps by mapping current assessments and identifying what content, skills, and depth of understanding are being emphasized, in terms of what is actually taught and assessed. Alignment tools are provided for local review of assessments in use and building a comprehensive assessment system.

    A 4-stage workshop planning framework using the materials in each module

    When I decided to organize the assessment toolkit as a guide for school leaders, I began to think about a practical framework to support users of the materials in designing their own professional learning activities. I decided to incorporate what I have been using for more than 30 years—a hybrid planning tool that integrates key ideas from David Kolb’s experiential learning model (1984) with Bernice McCarthy’s 4MAT® learning styles system for curricular planning (1987). McCarthy introduced me to Kolb’s and her model while I was working at the New Jersey Department of Education. I had an in-depth opportunity while at the department to learn from McCarthy and then to provide training under her mentorship on how to use the 4MAT® system, which was built upon Kolb’s experiential learning model. It is my hope that by synthesizing a few key ideas from these models into my work, their good ideas will remain relevant for future generations of educators.

    Kolb’s experiential learning cycle

    David Kolb’s research advanced a model of experiential learning that combined two key dimensions: how we PERCEIVE new or the reintroduction of information (along a vertical continuum from concrete to abstract) and how we then PROCESS that information (along an intersecting/horizontal continuum from reflection to experimentation).

    One dimension of Kolb’s model is how we PERCEIVE (take in) new or the reintroduction of information. At the top of the model is perceiving via concrete experiences. At the other end of the continuum is perceiving via abstract conceptualization. While two learners might have the same learning experience, such as taking a hike, how they perceive—or take in information about the experience of the hike—might be very different based on individual learning preferences. Concrete experiences are very individual and personalized, using senses, feelings, and prior knowledge to initially make sense of them. Abstract conceptualization is more generalized, with learners intellectually defining and categorizing what is being perceived. All learners use both ways to perceive or take in information at different times during learning.

    A second, intersecting dimension of Kolb’s model is how we PROCESS information once we’ve taken it in.

    Image 10

    Image 9

    At the far left of this axis is processing by active experimentation. At the other end of the continuum is processing by reflective observation. Combining the two axes yields four different learning style preferences as part of an ongoing experiential learning cycle.

    Kolb’s experiential learning theory is typically represented by a four-part learning cycle in which the learner “touches all the bases.” Effective learning is seen when a person progresses through a cycle of (1) having a concrete experience followed by (2) observation of and reflection on that experience which leads to (3) the formation of abstract concepts (analysis) and generalizations (conclusions) which are then (4) used to test hypotheses in future situations, resulting in new experiences (McLeod, 2013).

    Image 11
    Kolb’s Experiential Learning Cycle

    McCarthy’s 4MAT® System

    McCarthy employed Kolb’s experiential learning model to identify four distinct learning style preferences, each one defined by the intersection of two adjacent dimensions. A type 1 learner (represented by the upper-right quadrant) prefers reflecting on concrete experiences. This is a learner who needs to know WHY the new learning is personally important. McCarthy’s lesson and curricular planning begins with teachers designing a concrete experience for students to reflect on. A type 2 learner (represented by the lower-right quadrant) wants to know and reflect on WHAT the books, the experts, and the research say about this new concept or principle. These learners prefer to generalize what they are learning. A type 3 learner (represented by the lower-left quadrant) starts to get fidgety simply talking and reading about concepts. These learners are most interested in knowing HOW it works in the real word. Type 3 learners like to tinker and actively practice applying the new learning. The type 4 learner (represented by the upper-left quadrant) tends to be more interested in moving beyond what the experts say and how it typically works. These learners begin to ask, “WHAT IF I experiment with this concept or principle? What might happen? What will I learn from the next new experience?” The complete 4MAT® model is an integration of learning styles and left-right brain hemisphericity in a sequence that follows a natural and dynamic cycle of learning through eight stages (McCarthy, 1987).

    McCarthy’s work stresses that traditional instructional approaches are mostly supportive of type 2 and type 3 learners. Because all learners tend to have a preferred primary learning style (types 1–4), curriculum and instruction should be designed to continually honor all learners, and all learning styles. This is as true for adult learners as it is for our students.

    My workshop planning frame does not add McCarthy’s left-right brain components to each quadrant, but draws from Kolb and McCarthy to create a four-stage plan for designing professional learning. In part 2 of each module, tools and activities are suggested for each stage, beginning with a concrete and personal experience that can be shared and reflected upon.

    Image 1

    Generally, this is how the suggested worksop activities might flow:

    • Stage 1—Moving From Concrete Experience to Reflective Observation:
    • Workshop leaders create an experience. Participants reflect on what they think and their prior knowledge and compare their ideas with others in small groups. Workshop leaders listen in, perhaps recording some ideas or identifying some common factors. Workshop leaders think about how to make conncetions with the next phase of the learning cycle—integrating ‘expert’ information. (In a personal email from Bernice, as I was preparing this manuscript, was a reminder that I stress that the experience the teacher or workshop leader creates in Stage One must be an experience that the teacher can use to connect the content—or the essence of the learning— with the students.)
    • Stage 2—Moving From Reflective Observation to Abstract Conceptualization:
    • Workshop leaders introduce definitions, show models and examples, and may provide a short reading or video that ties ideas together with concepts. During this stage, guided practice might be used with some of the tools or strategies in order to generalize ideas and begin to build broader conceptual schemas.
    • Stage 3—Moving From Abstract Conceptualization to Active Experimentation:
    • In pairs or small groups, participants now practice using tools with some of their own examples, such as a local performance task, unit of study, or common assessment. Participants compare earlier impressions (from stage 1) with new observations and insights, based on collaborative analyses and discussion.
    • Stage 4—Moving From Abstract Conceptualization to New Concrete Experiences:
    • Teams brainstorm ways to apply or expand new learnings with instruction and assessment in their own curriculum. They create action plans and try using new strategies, reflecting once again how things worked.

    Kolb’s Experiential Learning Cycle

    A Workshop Plan With Suggested Activities


    Stage 1: WHY is this new learning important to me? And what do I already know?

    Moving From Concrete Experience to Reflective Observation

    Create a common concrete experience, asking participants to make connections, drawing on personal knowledge and experience.

    Small groups compare and reflect on common ideas.


    Stage 2: WHAT do the research and experts say?

    Moving From Reflective Observation to Abstract Conceptualization

    Help participants connect their personal reflections to broader, more abstract generalizations.

    Provide expert advice and review the research via interactive lecture, readings, and video clips.

    Use models and examples or nonexamples to further develop concepts and build schemas. Stop frequently to allow participants to consolidate the new learning.


    Stage 3: HOW does this work? How can I apply this?

    Moving From Abstract Conceptualization to Active Experimentation

    Provide guided practice using tools and protocols to examine examples and strategies.

    Suggest revising or updating a current assessment, lesson plan, or unit of study.


    Stage 4: WHAT IF I experimented with this? What might work in my classroom or school?

    Moving From Active Experimentation Back to Concrete Experiences

    Encourage participants to apply use of tools or protocols in their own work.

    Structure review and reflection activities.

    Give “homework”—experiment with a new strategy in your classroom. Reflect on how it worked and what you might do next.

    Source: Stages adapted from Kolb (1984) and McCarthy (1987).

  • Appendces

    Appendix A: Summary of Hess Tools to Guide Local Assessment Development, Instructional Planning, and PLC Activities

    Table 64

    Appendix B: Instructional and Formative Assessment Strategies to Uncover Thinking

    Table 53

    Appendix C: Troubleshooting Tips When Designing Assessment Items and Tasks
    Things to Avoid When Developing Selected Response (SR) and Constructed Response (CR) Items and Task Prompts

    Image 66a

    Appendix D: Sample “What I Need to Do” Rubrics——Science, ELA, Mathematics, Blank Template

    Table 55

    Appendix E: Student Profile: Science Inquiry Learning Progression

    Student: _________________________________________________________________________________

    DOB: ____________________________________________________________________________________

    Date of Entry: _____________________________________________________________________________

    Reentry: _________________________________________________________________________________

    Table 56

    The Individual Student Profile for Science Inquiry Learning provides a guide for instructional planning, progress monitoring, and documentation of essential learning of science inquiry skills and concepts within and across Grades PreK–5. The science skills and concepts listed were developed using student work samples across multiple classrooms. They have been integrated with consideration of developing literacy and numeracy skills at these grade levels. The intent is that each student will have a “profile” (folder/portfolio) with the student’s work samples and evidence.

    At the end of each school year, samples of student work in science could accompany this record when the Profile is passed on to the next year’s teacher. When including a sample of student work, label the student work with the inquiry indicator letter (e.g., “C” for Conducting Investigations) and the corresponding skills or concepts number(s) assessed with that assessment. (Note that numbers are for ease of use and relate to a general progression, not a specific intended skill sequence. For example, PreK–K skills generally develop before the Grade 1 skills and concepts, but not always in the numbered order.) Also list the assessment tool (by name or description) under column E with coding notes (e.g., “Ice Melt Performance Task”—A10, A11, C13, C14, D12, D13). Be sure the student work is dated (e.g., 10/2009); and indicate which content domain (Earth & Space, Physical, Life Science, STEM) is being assessed with this assessment.

    DIRECTIONS for Documenting Progress
    • / in the box indicates the skill or concept has been introduced, but the student has not yet demonstrated conceptual understanding or consistently applied the skill in the context of an investigation. It may be necessary to scaffold instruction, reteach the concept using another approach or another context or investigation, or reassess acquisition of skills/concepts at earlier levels if not yet mastered. Administering formative assessments prior to conducting extended investigations is highly recommended to guide instructional planning and appropriate timing of the summative assessments.
    • X in the box indicates the student has met expectations for this grade level, meaning that there is sufficient evidence (assessment data from multiple formats—teacher observations, formative assessments, performance tasks, etc.) to support this conclusion.

    Table 57

    Appendix F: Student Learning Progression Literacy Profile—Grades 7–8

    Student: _________________________________________________________________________________

    DOB: ____________________________________________________________________________________

    Date of Entry: ________________________ Reentry: ___________________________________________

    Table 58

    The Student Learning Progression Literacy Profile (LPLP) provides a general guide for instructional planning, progress monitoring, and documentation of essential learning of literacy skills and concepts within and across grades. The skills and concepts listed have been integrated with consideration of a research-based learning progression for literacy and the Common Core State Standards at the designated grade levels. At the end of each school year, samples of student work could accompany this record if the Profile is passed on to the next year’s teacher or used for reporting to parents.

    • Grade-level literacy teams can begin using the Literacy Profile by examining descriptions of Progress Indicators (e.g., M.RL.k identify use of literary techniques. e.g., flashback, foreshadowing) and narrative strategies (e.g., dialogue, sensory details) and explain how they advance the plot or impact meaning with the corresponding grade-level CC standards (e.g., 8.RL-3, 4) in order to develop appropriate instructional building blocks for each unit of study (selecting texts that increase in complexity, developing lesson sequences that move students along the learning continuum). Units of study typically encompass multiple Progress Indicators from several LPF strands (e.g., Making Meaning at the Word Level, Reading Literary Texts, and Writing Literary Texts).
    • Next, develop or identify the major common assessments for each unit of study used during the school year, asking this question: How can we best collect evidence of learning at different entry points along the learning progression? These assessments should include summative and performance assessments used across all classrooms at the grade level as a starting point, assessing multiple skills described along the learning progression typically taken by most students.
    • Additional evidence of learning, using ongoing assessments (preassessments, formative assessments, teacher observations, etc.), mid-assessments, and classroom-specific unit assessments can be documented in the profile throughout the school year. The depth and breadth of assessments used will vary according to intended purpose.
    DIRECTIONS for Documenting Progress Along the Learning Progressions
    • / in the box to the left of the Progress Indicator indicates the skill or concept has been introduced, but the student has not yet demonstrated conceptual understanding or consistently applied the skills or concepts in the context of applying them to various texts and text types. It may be necessary to scaffold instruction; reteach the concept using another approach or another context or text; or reassess acquisition of skills or concepts at earlier levels if not yet mastered. Administering ongoing formative assessments is highly recommended to guide instructional planning and appropriate timing of the summative or interim assessments.
    • X in the box to the left of the Progress Indicator indicates the student has met expectations for this grade level, meaning that there is sufficient assessment evidence (assessment data from multiple formats—teacher observations, formative assessments, student work from performance tasks, etc.) to support this conclusion.

    When collecting samples of student work (e.g., for parent conferences, progress monitoring), label the student work with the Literacy Profile indicator strand letters (“HD”—Habits & Dispositions; RL—Reading Literary texts, WI—Writing Informational texts, etc.) and include the Progress Indicator code for corresponding skills or concepts assessed with that assessment task. Also be sure the student work is dated. (Note that coding and ordering of the Progress Indicators (a, b, c, etc.) in the profile are for ease of use with the Learning Progressions Framework (LPF) for ELA & Literacy* and relate to a general progression, NOT a specific intended, lock-step skill sequence. For example, many of the same skills and concepts will generally develop and be practiced again and again with different and increasingly more complex texts across a school year. Beginning with an optimal lesson sequencing planning tool (such as the LPF and Literacy Profile) can provide insights into how to best support students with smaller learning steps in order to attain the end-of-year skills and concepts articulated in the Common Core State Standards.

    * Hess, Karin. (Ed. & Principle author). (2011). Learning progressions frameworks designed for use with the common core state standards in English language arts & literacy K–12. Available at http://www.nciea.org/publications/ELA_LPF_12%202011_final.pdf.

    Image 71

    Appendix G: Writing Persuasively Learning Progression (Strand 7, LPF)

    Table 60

    Appendix H: LPF STRAND 7 (Grades K–2) Sample Lesson Planning Steps Using Learning Progressions

    Image 73a

    Appendix I: An Expanded Glossary for Understanding and Designing Comprehensive Local Assessment Systems

    As schools move from having a “collection of tests” to designing a high-quality, comprehensive assessment system, it is essential to establish a common language and deeper understanding of the various system components and how they interrelate. In this expanded glossary, I have included some of the most common terminology associated with proficiency-based learning and comprehensive local assessment systems. Many of these concepts come from the world of educational measurement and large-scale testing (e.g., state assessment programs); others come from educational research and best practices literature. Even school systems with limited resources can implement many of these ideas to increase the quality of their assessments, the local assessment system, and use of assessment data.

    These definitions and descriptions are the way I use them in my work, which may not reflect how each state, each testing organization, each publisher or author, or even each school system uses them. Much has been written about most of these terms—sometimes whole books! In some cases, I have “adopted” definitions provided by experts in the field. Others represent a synthesis of ideas, including my own. Educational leadership teams should explore and establish a working vocabulary that best supports the ongoing design and refinements to their local proficiency-based assessment system.


    Accommodations: Accommodations are allowable supports that ensure two things: (1) all students will be able to demonstrate what they know and (2) the assessment will maintain validity—even with the allowable accommodations. A modification is a change in the assessment item resulting in changes to what is actually being assessed, such as reading a text to a student when the assessment was originally designed to assess decoding skills. An accommodation of reading a text aloud would not change an item asking for interpretation of theme or author’s point of view. Large-scale assessments frequently include these approved accommodations: scribing a student’s exact words for writing or providing reasoning in a math problem-solving task; reading the text aloud of a math problem or science prompt or providing bilingual dictionaries to ELL students so that they fully understand what they are being asked to do in the problem; providing additional time or shorter testing sessions; enlarging font side of print; and allowing students to take the assessment in a location other than the classroom. Reorganizing the same reading test questions by chunking the same text passages or grouping math problems by math domains (all subtraction questions grouped together) are proven (and allowable) accommodations that support students with learning disabilities (Hess, McDivitt, & Fincher, 2008). Accommodations should be “designed into” common assessments, rather than become an afterthought. PLC Tools #9 and #16B (Module 3) help assessment developers to ensure that the assessment is “fair” for all students.

    Accountability: Test-based accountability is a process by which results of testing are used to hold schools, school systems, and/or teachers responsible for student performance or stated outcomes. According to Linn (2011), “the primary goals of test-based educational accountability systems are (1) to increase student achievement and (2) to increase equity in performance among racial-ethnic subpopulations and between students who are poor and their more affluent peers” (p. 1). The quantitative approach through test-based accountability is not the only approach to holding schools and teachers accountable. Qualitative approaches, such as using data from school visits, classroom observations, and student interviews, have enjoyed wider use in some other countries than they have in the United States. The qualitative and quantitative approaches both have strengths and limitations. A hybrid system that capitalizes on the strengths of each approach is preferable to either of the two approaches alone.

    Achievement Level (Performance Level) Descriptors (ALDs/PLDs): Two terms—Achievement Level Descriptor and Performance Level Descriptor—are commonly used to name and describe a range of performance levels on a given assessment or when making overall decisions about whether a learner has demonstrated proficiency, such as for course completion or meeting high school graduation requirements. For large-scale assessments (e.g., licensing tests, college entrance exams, state assessments), defining ALDs for “proficient performance” is a policy decision set by a governing board. Simply put, ALDs describe what test takers must know and be able to do to be classified within a particular performance level. Schools often use a similar approach when describing performance levels on scoring rubrics with terms such as Emergent, Approaching, Proficient, or Advanced. In competency-based assessment systems, ALDs can be used to guide judgements about proficiency when examining work samples that constitute each student’s “body of evidence.”

    Alignment: Alignment is generally defined as a measure of the extent to which a state’s standards and assessments “agree” and the degree to which they work in conjunction with each other to guide and support student learning. It is not a question that yields a simple yes or no response; rather, alignment is a considered judgment based on a number of complex factors that collectively determine the degree to which the assessment tools used and evidence collected will gauge how well students are demonstrating achievement of the standards. In other words, how effective is the end-of-year summative assessment—and the assessment system as a whole—in measuring the depth and breadth of knowledge and skills set forth in the content standards in relation to the performance goals and expectations for each grade level? (Hess, 2013). Module 5 describes why alignment is important and how to design and conduct a local alignment study.

    Anchor Charts: Anchor charts are visual displays, co-created with students during instruction, and used later by students to support their learning as they apply things that may be difficult to remember at first. This makes the charts an excellent scaffolding strategy to support executive functioning and language acquisition. An anchor chart has one clear focus—such as a big idea or topic, a concept, a strategy, or a vocabulary term—which is listed at the top as a title. Key related ideas and examples are organized by purposefully using simple tables, arrows, color coding, spacing, bullets, and visuals. Module 2 includes examples of text structure anchor charts.

    Assessment Anchors/Anchor Papers/Exemplars: High-quality common assessments in all content areas include annotated assessment anchors. These are composed of student work samples usually collected and analyzed when piloting new assessments. It can take more than one administration of an assessment to identify examples at each performance level, and often some anchor papers are replaced with better models as time goes on and performance (and instruction) improves. During calibration and scoring practice, assessment anchors are used to interpret the rubric criteria and illustrate what student work products at different performance levels are expected to look like. Collaborative student work analysis protocols are used to identify the best set of training examples, which should include (a) unambiguous sample (exemplar) at each performance level, (b) multiple ways to achieve a proficient or advanced level, and (c) a few examples to illustrate guidance for how to score “line” papers—those papers that seem to fall between almost proficient and proficient. Many teachers also use anchor papers to help students understand expectations for performance, by collaboratively analyzing and scoring work samples. I recommend beginning with the extreme examples (lowest vs. highest) when asking students, “What do you see in this example?” or “What can be improved upon?” Module 3 explains how to annotate student work and develop anchor papers using PLC Tools #13, #14, and #15.

    Assessment Item and Task Types: A robust assessment system will include a variety of assessment types for each assessment purpose, rather than rely on one standardized assessment given at the end of the school year. Item and task types should include a combination of short answer, short- and longer-constructed response items, systematic observations and conferencing, end-of-unit assessments, common performance tasks, extended response tasks or projects, and peer and self-assessments. Appendix C provides examples of common issues with item development and how to correct them.

    Assessment Purposes and Uses: Formative, interim or benchmark, and summative assessments can be used for a variety of purposes within the assessment system. When designing or adopting assessments used to make high-stakes decisions (e.g., identifying students requiring additional supports or determining achievement of proficiency in a content area), it is critical to identify each assessment’s specific purpose(s) and intended uses as you design your system. It is a mistake to assume that one assessment will be able to meet all purposes. Module 3 describes and provides examples of assessment types and uses. Assessment purposes include the following:

    • Screening All Students generally include standardized assessments with a broad scope of lower-level (DOK 1, 2) skills or tasks; intended to determine if further diagnostic testing or observation may be needed
    • Diagnosing Individual Student Needs—typically used after a screening tool identifies potential learning needs; generally diagnostic tools have many items with a narrow focus (e.g., identifying letters and sounds); strongest examples include eliciting student explanations that uncover thinking and reasoning related to skills demonstrated (DOK 3), as well as acquisition of basic skills and concepts (DOK 1, 2)
    • Monitoring Progress—may be used both formatively and summatively; best examples are aligned with ongoing instruction and have a mix of item types, tasks, and range of cognitive demand (DOK 1–4); usually include common interim or benchmark assessments at specified periodic times during the school year (e.g., performance tasks each quarter, monthly running records)
    • Informing Instruction—most closely aligned with current instruction and used formatively, not for grading; best examples are embedded in ongoing instruction and have a mix of item types, tasks, and range of cognitive demand (DOK 1–4); most useful when student work analysis is used to examine misconceptions, common errors, conceptual reasoning, and transfer of learning to new contexts
    • Communicating Student or Program Outcomes—disaggregated data from these assessments is generally used summatively; include interpretations of performance from a combination of assessments (DOK 2–4), given over time to develop individual learning profiles, to determine student or group progress (proficiency), and/or to evaluate program effectiveness

    Assessment (Test) Blueprint: An assessment blueprint is used to analyze an existing assessment or to design comparable (parallel) assessments, such as when using task models to design performance tasks. Information in the test blueprint identifies the number and types of assessment items (multiple-choice, constructed response, performance task, etc.), the content standards assessed, the overall intended cognitive demand (performance standards), and scoring emphasis. For example, a test blueprint can identify whether there are more score points given to items assessing certain (high-priority) standards or some item types (e.g., problem-solving tasks getting more or fewer score points than items assessing routine mathematics operations). Module 5 describes how to develop and use test blueprints, using Alignment Tools #29 and #31.

    Assessment System Blueprint: An assessment system blueprint compiles information about “high priority” assessments used within and across grade levels to determine student proficiency in each content area. The assessments listed in the system blueprint are chosen because they complement each other, in terms of the assessment information they generate. Module 5 describes why system blueprints are useful and how to develop them using Alignment Tool #30.

    Authentic Applied Contexts: An instructional and assessment approach that is considered to be the most engaging and relevant to the learner is one that allows students to explore, discuss, and meaningfully construct concepts and relationships in contexts that involve real-world problems and issues. Project-based learning and performance tasks are typically designed with authentic tasks and audiences in mind. PLC Tool #20 and the GRASPS example (Module 3) provide protocols for developing performance assessments using authentic contexts.

    Authentic Intellectual Work: Newmann, Bryk, and Nagaoka (2001) argue that any sound student assessment system should be based on a broader vision of what students should learn and be able to do. The “contemporary demands of productive work, responsible citizenship, and successful management of personal affairs extend well beyond giving correct answers and following proper procedures for the work traditionally assigned in school” (p. 9). Three key characteristics of authentic intellectual work are

    • Construction of knowledge that involves interpretation, evaluation, analysis, synthesis, and organization of prior knowledge or solving new problems or understandings
    • Disciplined inquiry or using prior knowledge to gain in-depth understanding that enables one to communicate what he or she comes to know and understand in multiple ways
    • Value beyond school in which knowledge has “utilitarian, aesthetic, or personal value”


    Benchmark: A standard or established outcome by which something can be measured or judged. Quarterly or midyear common benchmark or interim assessments can be used to monitor progress toward meeting annual learning goals.

    Benchmark or Interim Assessment: Benchmark (also called interim) assessments are generally standards-based and designed to align with a pacing calendar and grade-level content standards. They are typically used to measure progress on large units of a districts’ curriculum and to determine which students are “on track” to meet specified annual academic goals. Interim assessments are designed to inform decisions at both the classroom level and beyond, such as at the school or grade level or district level. Thus, they may be given at the classroom level to provide information for the teacher; but the results can also be meaningfully aggregated and reported at a broader level. One caution I offer about use of short, economical fixed-form assessments used as interim assessments is that they may only provide useful data on the “average” student. In most cases, they will not always provide much useful information for students performing well below or well above the standards measured.

    Characteristics of benchmark or interim assessments include the following:

    • They are typically administered several times per year (e.g., fall, winter, spring, quarterly) or regularly within specified grade spans (e.g., at the end of Grade K, 2, 4, etc.).
    • The timing of the administration is likely to be controlled by the school or district rather than by the individual teacher.
    • Often common assessments are designed using task shells or to create task banks to ensure quality and comparability.
    • They may serve a variety of purposes, including predicting a student’s ability to succeed on a large-scale summative assessment at the end of the year, evaluating a particular educational program or pedagogy, or diagnosing gaps in student learning for groups or individual students.

    Benchmark Texts: Similar to annotated assessment anchors and anchor papers, benchmark texts are annotated to illustrate areas of text complexity. Lists of literary benchmark texts for different grade levels are sometimes available from text publishers. However, all texts—especially informational texts—will never appear on a single list. Collaborative processes can be used by PLC teams to identify benchmark texts at each grade level. These texts can then be used to calibrate future text analyses or compare texts of differing complexity. Often lists of benchmark texts are updated with better models as time goes on. Many teachers also use benchmark texts to help students understand how different texts are constructed. Module 2 includes protocols for how to analyze texts for use as benchmark texts.

    Benchmarking: The process of identifying assessment anchors for a given assessment task. Module 3 includes PLC Tools #13, #14, and #15 for developing anchor sets and using assessment anchors.

    Big Ideas: The concepts of big ideas and enduring understandings have been best operationalized by McTighe and Wiggins in the Understanding by Design framework (2005). Big ideas are broadly defined as the domain-specific core concepts, principles, theories, and reasoning that should serve as the focal point of curriculum and assessment if the learning goal is to have students make broader and deeper connections among specific skills and concepts taught. Big ideas tie the learning from multiple units of study together and are the unifying threads connecting the learning targets in learning progressions. Module 4 describes how big ideas and enduring understandings guide development of learning progressions.

    Body of Evidence: High-priority assessments identified in the Assessment System Blueprint are administered over an extended time in order to gather sufficient evidence that students are making progress toward achieving proficiency in a given content area. Best practices in developing the interpretations of student learning include use of common assessments, collaborative scoring, juried reviews of student work and portfolio products (e.g., involving experts, peers, community), and student self-assessment and reflection.

    Body-of-Evidence Verification: As described by Great Schools Partnership, determining proficiency using a body of evidence requires a review and evaluation of student work and assessment scores. The review and evaluation process may vary in both format and intensity, but verifying proficiency requires that educators use common criteria to evaluate student performance consistently from work sample to work sample or assessment to assessment. For example, teachers working independently may use agreed-upon criteria to evaluate student work, a team of educators may review a student portfolio using a common rubric, or a student may demonstrate proficiency through an exhibition of learning that is evaluated by a review committee using the same consistently applied criteria. I suggest developing and using common rubrics, aligned with agreed-upon Achievement Level Descriptors for different grade levels or grade spans, to facilitate discussions about determining proficiency when examining a student’s body of evidence.


    Close Reading: Close reading is characterized by the use of text evidence to support analysis, conclusions, or interpretations of text. Close reading goes beyond simply locating evidence, summarizing, or recalling explicit information presented in the text. Close reading should be practiced with shorter texts or excerpts of longer texts. Fisher, Frey, and Hattie (2016) describe four key strategies that support close reading: repeated reading to build fluency and deepen understanding, annotating chunks of text to mark thinking, teacher questioning to guide analyses, and discussion that helps to elaborate on analysis and thinking.

    Cognitive Demand: Cognitive demand describes the potential range of mental processing required to complete a given task, within a given context or scenario. Determining the intended cognitive demand of a test item or task requires more than simply identifying the “verbs” and the “nouns” describing the learning outcomes. Task developers must consider the reasoning and decision making required to complete a task successfully. “Tasks that ask students to perform a memorized procedure in a routine manner lead to one type of opportunity for student thinking; tasks that require students to think conceptually and that stimulate students to make connections lead to a different set of opportunities for student thinking” (Stein & Smith, 1998, p. 269). During instruction, the cognitive demand of highly complex tasks can be lessened using strategic scaffolding strategies without significantly changing the constructs being assessed. This might include strategies such as chunking texts for a reading assessment, group data collection for a science investigation, and facilitated discussions as a prewriting activity. Module 1 provides an in-depth discussion of common misconceptions about rigor, depth-of-knowledge (DOK), and cognitive demand.

    Cognitive Labs: Using cognitive labs is the least used of three strategies for determining how well a new or draft assessment will “perform”—meaning how effective is this assessment (test items, reading passages, and tasks) in eliciting the intended evidence of learning? A cognitive lab approach does not require the time and number of students that field testing and task piloting require; therefore, it is a good option for smaller schools with limited resources. To set up cognitive labs, teachers identify a small (less than 20) sample of students at varying ability levels to take a draft assessment. Sometimes, several teachers will each take a few students to work with or different students will do different parts of the assessment. Upon completion of the assessment task, students are interviewed about the task and results are analyzed to determine whether revisions are needed. Cognitive labs can also be used as a formative assessment-conferencing strategy. Module 3 describes this think-aloud process in detail, using PLC Tools #17–#19.

    Cognitive Psychology and Developmental Psychology: Cognitive psychology refers to the study of human mental processes and their role in perceiving, thinking, memory, attention, feeling, problem solving, and behaving. Developmental psychology is a scientific approach which aims to explain how children and adults change over time.

    Cognitive Rigor: Cognitive rigor encompasses the complexity of the content, the cognitive engagement with that content, and the scope of the planned learning activity (Hess, Carlock, Jones, & Walkup, 2009). Module 1 provides an in-depth discussion of what makes learning and assessment tasks more or less complex.

    Cognitive Rigor Matrix (CRM): The Hess Cognitive Rigor Matrices (CRMs) are content-specific tools designed to enhance increasingly rigorous instructional and assessment planning and practices at classroom, district, and state levels. Descriptors in each CRM can guide a teacher’s use of questioning during a lesson, shifting student roles to be more student directed. Module 1 includes eight content-specific CRM Tools, #1–#5D, used for examining cognitive rigor. Hess CRMs are available online at http://www.karin-hess.com/free-resources.

    Common Assessment/Common Assignment: Common assessments are designed and used to collect comparable evidence of learning within and across grade levels. They can include performance tasks (e.g., common writing prompts, parallel tasks with common scoring rubrics, such as internships or capstone projects), district interim or benchmark assessments (given at particular times during the school year to measure progress), and other district-level and state-level assessments. Administration guidelines must accompany common assessments (and common scoring guides) to ensure fidelity of implementation. If assessments are scored locally, calibration training and scoring practice are also essential. Modules 3 through 5 provide guidance in developing and using common assessments.

    Competency: The term competency is often used interchangeably with the term proficiency. As defined by Achieve, Inc., competencies include explicit, measurable, transferable learning objectives that empower students.

    Competency-Based Pathways (CBP): Achieve, Inc. has adapted from iNACOL/CCSSO a working definition of CBP (Domaleski et al., 2015) to include these indicators:

    • Students advance upon demonstrated mastery and can demonstrate their learning at their own point of readiness.
    • Assessment is meaningful and a positive learning experience for students and requires students to actually demonstrate their learning.
    • Students receive rapid, differentiated support based on their individual learning needs.
    • Learning outcomes emphasize competencies that include the application and creation of knowledge.
    • The process of reaching learning outcomes encourages students to develop skills and dispositions important for success in college, careers, and citizenship.

    Conjunctive and Compensatory Models (of Accountability Systems): Two methods for combining information from multiple assessments, subjects areas, or grade levels (e.g., test scores, artifacts of learning) include using a conjunctive or a compensatory model, or a combination of both. A conjunctive model requires a minimum level of performance on each of several measures, meaning that poor performance on one measure may result in a failure to meet established targets. In a compensatory model, good performance on one measure may offset poor performance on another (see Brennan, 2006, p. 570 for NCME and ACE information).

    Construct-Driven Assessment Design: Assessments that are construct driven require three things to be specified to guide the test and item development process: (a) the knowledge, skills and other attributes to be assessed; (b) the expected performance and procedures that will illuminate the intended knowledge and skills assessed; and (c) descriptions of the tasks or situations that apply the knowledge and performance specified for assessment or task design. This approach is intended to strengthen overall test validity and interpretation of scores. Test blueprints and test specifications typically contain this information with samples of test items to illustrate how tasks integrate content knowledge and performance expectations. Module 5 describes how to develop and use test blueprints to ensure that assessments emphasize what instruction focuses on.

    Constructed (Open-Ended) Response Item: A constructed or open-ended test question or task is one that requires the student to generate rather than select an answer from a list of possible responses. These items may include constructing or filling in a table or diagram (DOK 1, 2) or are questions that require some supporting evidence (DOK 3) for the answer given (e.g., text evidence to support interpretation of a theme in reading, data to support conclusions from a science investigation).

    Criterion-Referenced Test (CRT): A CRT measures an individual’s performance against a well-specified set of standards (distinguished from tests that compare students in relation to the performance of other students, known as norm-referenced tests).

    Cut Score: A cut score is a specified point on a score scale, such that scores at or above that point are interpreted or acted on differently from scores below that point. In standards-based assessments, cut scores may typically delineate passing from failing, proficient from basic performance, and so on.


    Data: Data includes factual information (such as measurements or statistics) used as a basis for reasoning, discussion, calculation, or judgements. Data can be qualitative or quantitative. Actionable data must be current or timely, reliable, and valid.

    Demographic Data: Demographic data focus on the gender, socioeconomic background, race, and ethnicity of students in a school or district. Disaggregating assessment data by demographics helps to understand what impact the educational system is having on different groups of students. This analysis is often used to interpret and compare how various subgroups of students are performing in relation to the overall population and delineates the context in which the school operates, which is crucial for understanding all other data and potential underlying issues.

    Depth-of-Knowledge (DOK): Norman Webb’s Depth-of-Knowledge Levels (Webb, 2002) describe the depth of content understanding and scope of a learning activity, which manifests in the skills required to complete a task from inception to finale (e.g., planning, researching, drawing conclusions). The Hess CRM tools integrate Webb levels with Revised Bloom’s Taxonomy. Webb’s four DOK levels are

    • Level 1: Recall and reproduction
    • Level 2: Basic skills and concepts
    • Level 3: Strategic thinking and reasoning
    • Level 4: Extended thinking

    Depth-of-Knowledge (DOK) “ceilings and targets”: An important consideration in the development of test items or performance tasks is to use the highest Depth-of-Knowledge (DOK) demand implicit in an assessment blueprint as the “ceiling” for assessment, not a target. A “DOK target” has a more narrow focus (e.g., only assess at DOK 2), whereas a “DOK ceiling” is the highest potential Depth-of-Knowledge level to be assessed, as well as assessing DOK levels up to the ceiling. The DOK ceiling is determined by the intended cognitive demand of the combination of standards assessed. Why is this distinction important? When only the highest DOK level is assessed as a target and only DOK 3 and 4 are assessed, the assessment as a whole may end up being too difficult for many students. Additionally, important information about learning along the achievement continuum would be lost. Multiple items or performance tasks covering a range of DOK levels will provide the most useful instructional information for teachers. Examples of possible assessment “ceilings” in science might be (a) DOK 1 ceiling: perform a simple procedure to gather data (DOK 1—measure temperature); (b) DOK 2 ceiling: organize and represent data collected over a period of time, making comparisons and interpretations (DOK 1—measure temperature + DOK 2—graph and compare data); and (c) DOK 3 ceiling: answer a research question related to the environment using data collected to draw and support conclusions (DOK 1—measure temperature + DOK 2—graph and compare data + DOK 3—conduct an investigation to explain the effect of varying temperatures of the river in different locations) (Hess, 2008a).

    Disaggregated Data: Assessment data can be broken down by specific targeted student subgroups (representative of a school or district), using criteria such as current grade or age, race, previous achievements, gender, ethnicity, and socioeconomic status, for example. Typically data are disaggregated to examine and make program decisions about assessment equity and fairness, curricular programs, instruction, and quality of local support and intervention programs.

    Disciplined Inquiry: The concept of disciplined inquiry was advanced by Newmann, King, and Carmichael (2007) where they argue that students can learn as adults do in various occupations. “To reach an adequate solution to new problems, the competent adult has to construct knowledge because these problems cannot be solved by routine use of information or skills previously learned” (pp. 3–4). Disciplined inquiry involves developing a knowledge base of relevant vocabulary, facts, concepts, and theories and developing a deeper understanding by proposing and testing ideas.

    Dispositions/Habits of Mind: Costa and Kallick (2008) describe 16 attributes—or Habits of Mind—that human beings display when they behave intelligently. They are considered the characteristics of what intelligent people do when they are confronted with problems the resolutions to which are not immediately apparent. Habits of Mind seldom are performed in isolation and are not limited to these 16 behaviors of Costa and Kallick: persisting; managing impulsivity; listening with understanding and empathy; thinking flexibly; thinking about thinking (metacognition); striving for accuracy; questing and posing problems; applying past knowledge to new situations; thinking and communicating with clarity and precision; gathering data though all senses; creating, imagining, innovating; responding with wonderment and awe; taking responsible risks; finding humor; thinking interdependently; and remaining open to continuous learning. States now promoting competency-based learning have included many of these 16 Habits of Mind as constructs to be assessed within the context of projects and performance assessments.


    Executive Functioning: Executive Functioning is a broad term describing the neurologically based skills involving mental control and self-regulation. Executive function is employed when performing activities required by complex performance tasks, such as planning, managing time, organizing materials and information, strategizing, and paying attention to and remembering details. Many strategic scaffolding strategies can be used to support students with poor executive functioning when completing complex tasks (e.g., providing assignment checklists, focusing questions, visual calendars, breaking tasks into manageable parts).

    Expeditionary Learning/Learning Expeditions: Learning expeditions are interdisciplinary studies, usually lasting six to twelve weeks. They may include a combination of case studies, in-depth projects, fieldwork, working with experts, service learning, and a culminating event that features high-quality student work. Module 3 includes guidance in using learning expeditions as student-designed performance assessments.


    Feedback: Grant Wiggins used to say that there is a big difference between giving advice and giving feedback. Most people don’t really want advice, but will listen to, reflect on, and ultimately make use of specific feedback to improve their performance. Feedback that describes performance in relation to criteria for success can lead to more effort and deeper learning. Wiliam and Leahy (2015) describe the research underlying two different kinds of feedback (corrective and reinforcing) and stress the importance of using feedback to “move learning forward.”

    Field Testing: Field testing is one of three common strategies used for determining how well a new or draft assessment will “perform”—meaning how effective is this assessment (test items, reading passages, overall difficulty, and tasks) in eliciting the intended evidence of learning? Field testing requires that a large representative sample of students (across gender, income, ability, race/ethnicity, etc.) take an assessment—such as a state reading or science assessment—before it is administered to all students at that grade level in the following year. Student responses from the field test are disaggregated by subgroup and analyzed to determine which test items and tasks can be used as currently written, which items or tasks need to be revised and field-tested a second time before they can be used, and which ones should not be used at all in the operational test. This strategy is used most often by testing companies, either by embedding new (field test) items into an existing test at that grade level, or administering the test to students at the intended grade level who will not be taking this assessment in the future.

    Because this approach is costly and time-consuming, it is not ideal for schools to employ. That said, one way I have seen larger schools or consortia of schools successfully use field testing is with development of performance tasks, such as “testing” new writing prompts. A representative sample of at least 100 students at the intended grade level (across several schools or districts) is given the task to complete. Work samples are collected and collaboratively analyzed by teachers. The goal is NOT to score every paper, but to see if a large enough sample (of student work) exhibits the intended assessment evidence. If you find that after looking at the first 25 papers, for example, the prompt was unclear or students did not provide enough evidence to be scored, then you can probably assume that some revision will be needed before the task can be widely used as a common task. If most student samples are scorable but fall into the lower performance levels, the committee may determine that instruction was not adequate or expectations were unreasonable for this grade level. Either way, this task might not be ready for “prime time.” Module 3 contrasts this process with piloting and using cognitive labs as a means for validating new assessments brought into the local assessment system.

    Flexible Pathways (to Graduation): In Vermont, the Flexible Pathways Initiative, created by Act 77 of 2013, encourages and supports the creativity of school districts as they develop and expand high-quality educational experiences as an integral part of secondary education in the evolving 21st century classroom. Flexible pathways promote opportunities for students to achieve postsecondary readiness through high-quality educational experiences that acknowledge individual goals, learning styles, and abilities and can increase the rates of secondary school completion and postsecondary training. Evidence of flexible pathways can include dual enrollment and early college programs, internships and work-based learning initiatives, virtual or blended learning opportunities, and use of Personalized Learning Plans (PLPs) to design student-centered learning.

    Formative Assessment: Formative assessment is as much a process or instructional strategy as it is a measurement tool. Also known as “short-cycle” assessment, formative assessment practices are embedded in instruction and used frequently during a teaching or learning cycle (e.g., to preassess readiness, to target mastery of specific skills, to check conceptual understanding). The primary purposes of using assessment data formatively are to (a) diagnose where students are in their learning along a learning progression, (b) identify gaps in knowledge and student understanding, (c) determine how to help some or all students move ahead in their learning, and (d) provide opportunities for peer and self-assessment as part of the learning process. Formative assessment tasks may be designed for all students (e.g., unit preassessment, planned probing questions during a lesson) or may vary from one student to another depending on the teacher’s judgment about the need for specific information about a student at a given point in time. Assessment information gathered from a variety of activities (observations, quick checks for understanding, small-group problem solving and discussion, conferencing, common performance tasks, exit tickets, etc.) can be used formatively. Formative uses of assessment can uncover a range of understanding (DOK 1–4), including deeper levels of reasoning and thinking that lead to student meta-cognition and taking action about their own learning. Learning—not grading—is the primary focus of formative assessment. Hess’s Tool #10 was designed to examine formative assessments in use. Module 3 contrasts formative, interim, or summative uses of assessment and describes how to use PLC Tool #10 to develop formative assessments and interpret results.


    Kolb’s Experiential Learning Model: David Kolb’s research (1984) advanced a model of experiential learning that combined two key dimensions in a four-stage cycle: how we perceive new or reintroduction of information (along a continuum from concrete to abstract) and how we then process that information (along an intersecting continuum from reflection to experimentation). In Kolb’s theory, the impetus for the development of learning new concepts is provided by new experiences: Learning involves the acquisition of abstract concepts that can be applied flexibly in a range of situations (McLeod, 2013). Each module includes a suggested cycle of workshop activities based on Kolb’s experiential learning model.


    Learning Outcomes/Objectives/Goals: Several terms are used interchangeably by educators to describe what students will understand and be able to perform with regard to the content and skills being taught. An effective learning goal is composed of a clearly stated progression of learning targets that demonstrate eventual attainment of the desired performance or proficiency. Smaller-grained learning targets guide day-to-day instruction and formative uses of assessment as a student makes progress toward the broader learning goal.

    Learning Progression (LP): Learning progressions, progress maps, developmental continuums, and learning trajectories are all terms that have been used to generally mean research-based descriptions of how students develop and demonstrate deeper, broader, and more sophisticated understanding over time. A learning progression can visually and verbally articulate a hypothesis about how learning will typically move toward increased understanding for most students. Learning progressions are based in empirical research and therefore are not the same as curricular progressions or grade-to-grade standards. This is because LPs also include typical misconceptions along the way to reaching learning goals. When developing LPs, cognitive scientists compare novice performers to expert performers. Novice performers generally have not yet developed schemas that help them to organize and connect information. Novice performers use most of their working memory just trying to figure out what all the parts of the task are, not in engaging with the task or assignment. Module 4 describes how to use learning progressions to develop assessments and student profiles to monitor progress.

    Learning Target (LT): As defined by Moss and Brookhart (2012, p. 3), learning targets guide learning and therefore should use wording that students can understand and use for peer and self-assessment. LTs can be thought of as “lesson-sized chunks” of information, skills, and reasoning processes that students will come to know deeply over time. In proficiency-based systems, learning targets can be stated as short descriptive or bulleted phrases that create levels of a performance scale. A series of LTs detail the progression of knowledge and skills students must understand and be able to perform to demonstrate achievement of the broader learning goal or proficiency. Module 4 describes how to use learning progressions to develop daily learning targets and formative assessments aligned with progressions.

    • Foundational LTs: Foundational learning targets contain essential prerequisites, knowledge, and basic processes which may not explicitly be stated in academic standards but are necessary to build the foundational understandings required to reach the overarching learning goal or proficiency. Unit preassessments often focus on foundational learning targets to determine whether students are ready to move on.
    • LTs of Increasing Cognitive Complexity: A series of learning targets should form a progression of learning with those at the higher end of a performance scale (i.e., Proficient and Advanced levels) describing increasingly more sophisticated understanding and cognitive complexity that exceed expectations of a single academic standard.

    Local Assessment System: A comprehensive local assessment system includes multiple components, identifying high-priority local and state-level assessments used in the educational system. The most important characteristic of an assessment system is that it is designed to provide cohesive and actionable information about student performance, using multiple measures. The various components of the system across the different educational levels provide complementary information so that decisions can be based on making valid inferences. Module 5 provides guidance and tools for designing and refining a local comprehensive assessment system. Four critical components make up high-quality comprehensive local assessment systems:

    • Technically sound assessments of academic achievement and assessments of district-based goals for learning (e.g., community service)
    • A theory of action that illustrates how curricular programs, instruction, assessments, and assessment data interact
    • Adequate protocols, professional development, and leadership supporting implementation of assessment principles and practices
    • Explicit and well-coordinated mechanisms (feedback loops) for managing assessments, assessment results, and addressing student and program needs


    Metacognition: Metacognitive skills are one of the Habits of Mind or dispositions identified in the research as essential college and career readiness skills (Hess & Gong, 2014). They are evidenced when students are able to reflect on their own learning, frame and monitor their own learning goals, and seek out and use evidence of their own progress from one or more sources to improve their performance. Teachers can design instructional tasks that require students to use metacognitive skills to self-assess and to act on feedback from peers.

    Multiple (Assessment) Opportunities: Sufficient opportunities or occasions are provided for each student to meet proficiency or other learning expectations or requirements. This can be accomplished in a variety of ways with different assessments, while still scoring performance on a common scale or rubric.

    Multiple-Choice (MC) Items: A multiple-choice item—also called selected response—consists of a problem, known as the stem, and a list of suggested solutions, known as alternatives. Traditional MC items have included one correct or best alternative, which is the answer, and three incorrect or inferior alternatives, known as distractors. Next-generation assessments now often include more than four alternatives with several possible correct answers among them. Appendix C includes strong and weak examples of MC items.


    Norm-Referenced Test (NRT): A norm-referenced test is used to compare individual student performance with a larger (norming) group, usually based on a national sample representing a diverse cross-section of students. NRT results typically are measured in percentile ranks. Examples of NRTs include Iowa Tests of Basic Skills and the Stanford Achievement Test. (Norm-referenced tests differ from criterion-referenced tests, which measure performance compared with a standard or benchmark.)


    Performance Indicator: This is a term that seems to have many different uses and interpretations in the literature. For this reason, it might be useful to think of a “performance indicator” the way we do for learning targets. A series of LTs—or performance indicators—detail the progression of knowledge and skills students must understand and be able to perform to demonstrate achievement of the broader learning goal or proficiency. Performance indicators are often used in rubrics to further define criteria for success and thus, represent possible assessment evidence.

    Performance Task/Performance Assessment: Performance assessments can be used for a variety of purposes—instructional–formative and evaluative–summative. Performance tasks are generally defined as multistep assignments with clear criteria, expectations, and processes that measure how well a student transfers knowledge and applies complex skills to create or refine an original product (Center for Collaborative Education [CCE], 2012). Often common assessments include a performance component because they are designed to integrate multiple skills and concepts within authentic contexts. Performance tasks produce a variety of real-world products such as essays, demonstrations, presentations, artistic performances, solutions to complex problems, and research or investigation projects. They are assessments that may be completed individually or with others. Module 3 includes a variety of tools and examples to support the design of high-quality performance assessments for formative, interim, and summative use.

    Personalization: Personalization is a learning process in which schools help students assess their own talents and aspirations, plan a pathway toward their own purposes, work cooperatively with others on challenging tasks, maintain a record of their explorations, and demonstrate learning against clear standards in a wide variety of media, all with the close support of adult mentors and guides (Clarke, 2003). In some states and school districts, every school student is assigned a responsible adult, in addition to a school counselor, to provide this support.

    Piloting: Piloting is one of three common strategies used for determining how well a new or draft assessment will “perform”—meaning how effective is this assessment (test items, reading passages, overall difficulty, and tasks) in eliciting the intended evidence of learning? An assessment piloting process requires fewer students than does field-testing a new assessment and is therefore a time-efficient strategy for “trying out” new assessments. In a locally designed pilot, students from a minimum of two or three representative classrooms (within or across schools) will take an assessment. Then educators collect and collaboratively review the results (student work, scoring rubrics, administration guidelines, etc.) to determine whether or not this assessment elicited the intended evidence and can be reliably scored. Student work analysis is used to refine both task prompts and rubrics. For strong assessments, scoring anchors may also be identified during the piloting phase.

    Portfolios and Exhibitions: These performance assessment models typically address a wide range of content-area and cross-curricular standards, including critical thinking and problem solving, reading and writing proficiency, and/or work habits, dispositions, and character traits (e.g., teamwork, preparedness, responsibility, persistence). In course-based portfolio and exhibition assessments, individual teachers use common, agreed-upon criteria to evaluate a body of work that students have completed over the course of an instructional period. For cross-curricular portfolios and exhibitions, groups of content-area teachers or review committees evaluate the work. It should be noted that portfolios do not have to require students to create new work, but may require that students collect and present past work, evidence of growth, self-reflection, and accomplishments over time. Exhibitions can also incorporate examples of past work that has been used as a foundation for new products. Module 3 includes a variety of tools and examples to support the design of high-quality performance assessments, including student-designed assessments.

    Prerequisites/Preassessment: Prerequisite (foundational) knowledge and skills encompass the discrete learning upon which more complex tasks are built. I recommend that preassessments begin with assessing the core prerequisite skills needed to build up in order to be successful at completing more complex and cognitively demanding tasks. Module 4 describes how learning progressions can be used to develop preassessments based on prerequisite skills or readiness for learning.

    Proficiency: The term proficiency is often used interchangeably with the term competency. The Rhode Island Department of Education defines proficiency as the measure of a student’s knowledge and skill demonstrated in a consistent manner across multiple disciplines in various settings over time. The Vermont Agency of Education elaborates on this definition, stating, “Proficiencies include explicit, measurable, transferable learning objectives that empower students.”

    Proficiency-Based Learning (PBL): PBL (also called competency-based learning) is described by Sturgis and Patrick (2010) as embodying these four characteristics:

    • Learning outcomes emphasize proficiencies that include application and creation of knowledge, along with the development of important skills and dispositions.
    • Student progress is measured and supported.
    • Assessment is meaningful and a learning experience for students; students receive timely, differentiated support and feedback based on their individual learning needs; and students advance upon mastery, not seat time; learning is the constant and time is the variable.
    • Learning occurs with the student at the center; students take ownership of their learning; and learning can happen anywhere and anytime.

    Project-Based Learning: Project-based learning is designed to make learning more meaningful and relevant to students. Projects require that students go beyond the textbook to study complex topics based on real-world issues (e.g., examining water quality in their communities or the history of their town). Students often work in groups to gather and analyze information from multiple sources, including interviews with experts and collecting survey data. Project-based classwork is generally more demanding than traditional book-based instruction, where students may just memorize facts presented in a single source. Students are expected to utilize original documents and data, applying principles covered in traditional courses to real-world situations. Projects have multiple components and can last weeks or may cover entire courses. Student work is assessed in stages (e.g., gathering and analyzing information, integrating what was learned, presenting information) and presented to “authentic” audiences beyond the teacher, including parents and community groups.


    Qualitative data: Qualitative data are based on information gathered from sources such as one-on-one interviews, focus groups, surveys, or systematic observations done over time.

    Quantitative data: Quantitative data are based on “hard numbers” such as enrollment figures, dropout rates, and test scores.


    Reliability: Test reliability is defined as the consistency of test scores over different test administrations, multiple raters, or different test questions. Reliability answers the question “How likely is it that a student would obtain the same score if they took the same test a second time (test–retest reliability) or if someone else scored this student’s test (interrater reliability)?” In statistics, interrater reliability or agreement—or concordance—is the degree of agreement or consistency with which two or more judges rate the work or performance of assessment takers. Strong interrater reliability is easiest to achieve with selected response test items (questions having only one right answer). Interrater reliability for scoring more complex assessments (projects, portfolios, observations, or performance tasks) is strengthened by the use of clear scoring guides, annotated student work, and periodic calibration practice. Assessments can be reliable and still not be valid (assessing what is intended to be assessed).

    Rubrics/Scoring Guides: Rubrics and scoring guides provide a set of rules or guidelines for assigning scores to test takers. Rubrics are often used to elaborate on how to score longer constructed response items, performance tasks, and extended projects. More simplified scoring guides tend to be used for shorter open-ended test items. Scoring criteria fall into several types: Form criteria (e.g., following formatting guidelines for documenting sources, editing grammar use, turning in work on time—DOK 1); Accuracy of Content criteria (e.g., calculations, definitions, applying concepts or terms appropriately—DOK 1, DOK 2); Process criteria (e.g., gathering and organizing information, identifying trends, graphing data—DOK 2); Impact criteria (e.g., effectiveness in solving a problem or convincing an audience—DOK 3 or DOK 4); and Knowledge Production criteria (e.g., generating new questions for investigation or new insights, reflections on new learning—DOK 3 or DOK 4). There are also several different types of scoring rubrics, each with unique strengths and weaknesses. Module 3 describes differences among rubric criteria and rubric design. PLC Tool #11 guides you through a rubric quality review. Rubric types include the following:

    • Analytic rubrics apply several different, distinct criteria to evaluate products (e.g., process skills, content accuracy, editing skills). A score is given for each separate criterion, thus providing multiple scores and specific feedback to students on strengths and weaknesses of the performance. Analytic rubrics offer opportunities to yield “weighted scoring” such as giving more scoring weight to reasoning and justification (Knowledge Production) than to steps being followed to solve the problem (Process).
    • Holistic rubrics combine several criteria to yield one generalized—or holistic—score. While these scores can be used to describe overall performance, they are not as useful in providing specific feedback to students on strengths and weaknesses of the performance. Holistic rubrics are frequently used for scoring performance tasks in large-scale assessments because they may be seen as more efficient (taking less time) and tend to be easier to get scoring agreement across raters (interrater reliability) than do analytic rubrics.
    • Task-specific rubrics are typically designed to score one specific constructed response item or performance task (e.g., essay, open-ended questions on state assessments) and therefore include examples of specific text evidence from a passage or a particular mathematical representation or method expected in the student response. Task-specific scoring rubrics are most useful when scoring large amounts of student work with very particular success criteria or when defining what partial credit looks like.
    • Generalized rubrics are typically designed to score multiple, similar (comparable) performance tasks at different times during the learning process (e.g., argumentative essay, scientific investigation). Because these are generalized for use with many tasks, it is highly recommended that annotated student work samples include explanations of specific scoring evidence that help to interpret rubric criteria. Generalized scoring rubrics are especially useful when building student understanding of success criteria and learning goals and are most useful when scoring student work products over time with similar success criteria, such as in a portfolio.


    Scaffolding Strategically: Scaffolding is the purposeful use of supports to achieve a balance between cognitive complexity and student autonomy, as the overall cognitive demand of the task increases. Strategic scaffolding means the intentional steps designed into the instruction that ensure that all students can eventually complete the same complex task independently. The primary difference between scaffolding and differentiating is that differentiating means different—different assignments, different options, student choice. Differentiation is achieved by changing the content, the process skills, and/or the products of learning. Modules 1 and 2 include a variety of strategic scaffolding strategies to support deeper learning.

    Scales—Performance, Proficiency, Scoring, Analytical: Various terms are used to describe a continuum of performance descriptors that articulate distinct levels of demonstrated knowledge and skills relative to a learning outcome or proficiency statement. The term Proficiency Scales is defined by the Vermont Agency of Education as a single criterion rubric that is task neutral and includes explicit performance expectations for each possible rating (adapted from Gallaudet University). Moore, Garst, and Marzano (2015) use the term Performance Scale to mean a continuum that articulates distinct levels of knowledge and skills relative to a specific standard. Great Schools Partnership calls these scales “scoring scales.” [Personally, I prefer to use the broader term of performance scale so as not to imply this is only for scoring; and I do not recommend developing scales for every standard or even for a set of prioritized standards. This to me would end up being somewhat unmanageable to track single standards and quite limiting in that rich performance tasks and proficiency-based learning requires the integration of multiple standards. Consequently, my use of performance or proficiency scales is a hybrid of other models.]

    Schema: “A schema is a cognitive framework or concept that helps organize and interpret information. Schemas can be useful because they allow us to take shortcuts in interpreting the vast amount of information that is available in our environment. . . . Schemas can (also) contribute to stereotypes and make it difficult to retain new information that does not conform to our established ideas about the world. . . . In Piaget’s theory, a schema is both the category of knowledge as well as the process of acquiring that knowledge. As experiences happen and new information is presented, new schemas are developed and old schemas are changed or modified” (Cherry, 2016).

    Standard: A standard represents a long-term learning goal, such as for the end of a year’s learning. Standards describe “learning destinations” and differ from empirically based learning progressions that describe typical pathways of learning to arrive at the destination. Module 4 contrasts the differences between learning progressions and standards.

    Standardized Assessment: When an assessment is administered using specified conditions, protocols, and procedures (e.g., time allotted, allowable accommodations or use of materials such as calculators), it is called a standardized assessment. Common assessments used locally across classrooms and schools should be accompanied by an administration guide which helps teachers to know under what conditions the assessment should be used. Standardized administration guides may include specific prerequisites (e.g., administer after students have completed the unit on Pythagorean theorem) to ensure that all students have had an adequate opportunity to learn the content prior to being tested. Module 3 includes examples and a planning template for developing administration guides for ensuring that common performance assessments are administered similarly by different educators, making assessment results more reliable.

    Stanine: A stanine is based on a standard score of nine units in which 1, 2, or 3 indicate below-average performance; 4, 5, or 6 indicate average performance; and 7, 8, or 9 indicate above-average performance. Stanines are still used in some standardized tests.

    Student-Centered Learning (SCL): Student-centered learning involves shifting the traditional role of teacher and student so that students are more engaged and more responsible for their own learning. The Nellie Mae Education Foundation has supported schools and districts in implementing SCL practices. In contrast to more traditional, adult-directed approaches to instruction, SCL adheres to four broad principles (Hess & Gong, 2014):

    • Learning is personalized: Each student is well known by adults and peers and benefits from individually paced learning tasks, tailored to his or her needs and interests. Collaboration with others and engaging, authentic, increasingly complex tasks deepen learning.
    • Learning is competency based: Students move ahead when they demonstrate competency, and they have multiple means and opportunities to do so. Differentiated supports ensure that all students have what they need to achieve college and career readiness goals.
    • Learning takes place anytime, anywhere: Students learn outside the typical school day and year in a variety of settings, taking advantage of learning technologies and community resources, and receiving credit for learning, wherever it happens.
    • Students exert ownership over learning: Students understand that they improve by applying effort strategically. They have frequent opportunities to reflect on and understand their strengths and learning challenges. They take increasing responsibility for learning and assessment, and they support and celebrate each other’s progress.

    Success Criteria: Success criteria are the established learning targets for a given task, performance, or project. Success criteria are stated using “kid-friendly” wording that students can understand and use for peer and self-assessment. Success criteria are incorporated into scoring rubrics using several broad criteria (e.g., use of research skills) with performance indicators that further define expectations specific to the given task (e.g., conduct a key word search, check validity of sources). Module 3 discusses how scoring guides and rubrics incorporate success criteria.

    Summative Assessment: A summative assessment is given at the end of a period of learning (e.g., unit of study, end of semester) and generalizes how well a student has performed. Summative assessments are typically used for grading and making high-stakes decisions. Modules 3 and 5 discuss formative, interim, and summative uses of assessment.

    Systematic Observation: Systematic observation is a formative assessment strategy used to document knowledge and skills of a group of students over a period of time, rather than assessing all students at the same time (on demand). Generally this approach works well for areas that are difficult to assess with pencil and paper tests or when multiple opportunities are provided for students to demonstrate acquisition of skills and knowledge over time. Systematic observation captures the often “missed opportunities” for collecting assessment data during an instructional activity and can document progress being made over time in meeting broader learning goals for the school year. Module 3 provides a recording template for documenting systematic (over time) observations.


    Task Shell/Task Model: Task shells (also referred to as task models) provide general guidelines for what a performance task should include if it is to effectively measure the stated learning objective(s). For example, a template for test or task developers for science performance tasks might include the following components: (a) a scenario or data related to a real-world phenomenon; (b) a prompt asking students to state a hypothesis (or testable question) based on the data; (c) a list of steps needed to conduct the investigation; and (d) results of the investigation, stating conclusions based on evidence collected. Test developers use task shells to ensure that (parallel) performance tasks will elicit comparable evidence of learning year to year, and can be reliably scored using the same performance criteria. Module 3 describes how task shells can be used to develop common performance assessments.

    Technical Quality of Assessments: High-quality assessments, no matter the form, format, or intended purpose, should be evaluated in terms of several research-based criteria. Hess’s Tool #9 provides PLC teams with a protocol for evaluating and giving feedback on the quality of assessments included in the local assessment system. Hess’s Tool #9 incorporates the application of these criteria when developing high-quality assessments. Module 3 describes how to use PLC Tools #9 and #16B to develop, refine, and “validate” local performance assessments.

    • Clarity—Clarity of expectations and intended student products is the starting point of all high-quality assessment development.
    • Validity—Determinations of test validity are based on the degree of alignment to both content standards and intended rigor being assessed. Interpretations of student learning cannot be considered “valid” when assessments (items and tasks) are poorly aligned with intended learning outcomes or proficiencies.
    • Reliability—Test reliability is the consistency with which two or more judges rate the same work products or performances. Interrater reliability for more robust assessment tasks, such as performance tasks and portfolios, is generally highest when calibration training includes annotated work samples that help to differentiate levels of performance on scoring rubrics.
    • Opportunities for Student Engagement—While not required in all types of assessments, systems that profess to be student centered should review performance tasks and other student-designed learning opportunities for how well they engage students in making decisions about the approach to and quality of their work.
    • Fairness—“Fair” assessments clearly address the expectations specified by the performance indicators so that all students are afforded an equitable opportunity to demonstrate corresponding skills and knowledge. Assessments should be prescriptive enough to require the demonstration of the expected skills and knowledge so that student interpretation will not dilute the intended demand of the performance indicator.

    Text Complexity: Being able to read increasingly more complex texts has always been a driving goal of reading instruction. According to Hiebert (2012), who has written extensively about text complexity, many qualitative and quantitative dimensions—including topic complexity and author’s discourse style—contribute to making a text more or less complex. Hess and Biggam (2004) identify additional qualitative dimensions that affect text complexity to include formatting and layout, genre features, text structure, level of reasoning required, and reader background knowledge. But the variable that consistently predicts reading comprehension is vocabulary. The core vocabulary accounts for at least 90% of the words in most texts. Module 2 provides guidance and Text Complexity Tools #6, #7, and #8 for examining text complexity and planning for instruction or assessment.

    Text Features: Text features are used to organize, add, or elaborate on information presented in informational texts, including headings and subheadings, captioned photos, labeled diagrams, graphs and charts, and inset text, for example. Features used in informational texts differ from those used in literary texts, such as including illustrations to emphasize character actions or features, or using white space to indicate time lapses in chronology.

    Text Pattern Signals: Words or phrases embedded in texts which help to indicate—or signal—the organizational features of the text and indicate to the reader where the text may be “heading.” Signals, in combination with the context of their use and various semantic cues, determine text structure—not signals alone (Hess, 2008c; Vacca & Vacca, 1989). Module 2 includes sample instructional strategies for using signal words for each text structure.

    Text Structures: Test structures are the internal organizational structures used within paragraphs or longer texts, appropriate to genre and purpose. This is different from genre characteristics that help students to determine whether a text is a fable, fairy tale, or myth based on how it is structured. Increasingly complex structures tend to follow this general progression: sequence (procedure), chronology (time order), description, definition, compare–contrast, cause–effect, problem–solution, proposition–support, critique, and inductive–deductive (Hess, 2008c). Module 2 includes instructional strategies for teaching about text structures and developing text structure anchor charts.

    Transfer: “In cognitive theory, knowing means more than the accumulation of factual information and routine procedures; it means being able to integrate knowledge, skills, and procedures in ways that are useful for interpreting situations and solving problems” (National Research Council, 2001, p. 62). The ability to transfer our knowledge and skill effectively involves the capacity to take what we know and use it creatively, flexibly, fluently, in different settings or problems, and on our own (Wiggins & McTighe, 2005, 2012). When a student is able to understand concepts and apply skills beyond what is considered to be “routine understanding” (DOK 1–2), we call that transfer. Module 1 introduces the concept of transfer in the discussion of the Hess Cognitive Rigor Matrix. The theme of designing instruction and assessment for deeper thinking and transferability of learning is carried though all modules of the Local Assessment Toolkit.

    Transferrable Skills: Also referred to as dispositions, applied learning, soft skills, or “cross-cutting” skills, transferrable skills are not content-specific skills. They should not be taught in isolation; they are best taught and assessed within the context of each content domain (Hess & Gong, 2014). Many states and school districts have identified specific transferrable skills and work habits as part of their proficiency-based systems. For example, the Vermont Agency of Education includes these as their list of transferrable skills:

    • Clear and effective communication
    • Creative and practical problem solving
    • Informed and integrative thinking
    • Responsible and involved citizenship
    • Self-direction


    Universal Design: The idea of universal design in assessment comes from architectural practices. Think about how a person might get from the first floor to the second floor of a building. A ladder would only provide access for some people. A set of stairs would work for more people, but not for those using wheelchairs or crutches. An elevator is the most accessible structure and therefore more “universally designed” for access to all who want to get to the second floor. Thompson, Johnstone, and Thurlow (2002) lay out guidelines to ensure that large-scale tests meet the principles of universal design. Application of those guidelines in the construction of assessments ensures that all students have a fair and equitable opportunity to demonstrate their learning. Hess’s Tool #9 incorporates the application of these principles when developing high-quality local assessments. Module 3 describes how to use PLC Tool #9 to develop and refine local performance assessments for “fairness” and universal access. Universally designed assessments incorporate these seven elements:

    • Inclusive assessment population (Assessments are developed in the context of the entire population.)
    • Precisely defined constructs (measuring exactly what they are intended to measure)
    • Accessible, non-biased items (Items are reviewed for content quality, clarity, and lack of ambiguity, and sometimes for sensitivity to gender or cultural issues.)
    • Amendable to accommodations
    • Simple, clear, and intuitive instructions and procedures
    • Maximum readability and comprehensibility (e.g., use of simple, clear, commonly used words or eliminating any unnecessary words)
    • Maximum legibility (e.g., spacing, formatting of text and visuals)


    Validity: Validity refers to the degree to which tests measure what they purport to measure. Alignment studies are designed to examine how well an assessment design and test items match the intent (standards to be assessed). For example, does this writing assessment actually assess a student’s ability to compose and communicate ideas in writing or simply the student’s ability to edit a composition? Assessments can be valid (assessing what is intended to be assessed) and still not be reliable in terms of scoring. Module 3 describes how to create validation teams to develop, validate, and refine local assessments. PLC Tools #9, #16A, and #16B are used to guide the task validation process.


    ACT, Inc. (2006). Reading between the lines: What the ACT reveals about college and career readiness in reading. Iowa City, IA: Author.
    Ainsworth, L. (2014). Common formative assessment 2.0: How teacher teams intentionally align standards, instruction, and assessment. Thousand Oaks, CA: Corwin.
    Allen, J. (1999). Words, words, words: Teaching vocabulary in Grades 4–12. York, ME: Stenhouse.
    American Association for the Advancement of Science. (2001). Atlas of science literacy (Vol. 1). Washington, DC: American Association for the Advancement of Science and the National Science Teachers Association.
    Anderson, L., Krathwohl, D., Airasian, P., Cruikshank, K., Mayer, R., Pintrich, P., Raths, J., & Wittrock, M. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s Taxonomy of educational objectives. New York, NY: Addison Wesley Longman.
    Andrade, H. (2016). Arts assessment for learning: What is formative assessment? [Video]. New York, NY: NYC Department of Education. Retrieved from http://artsassessmentforlearning.org/about-assessment
    Bader, E. J. (2014, July 7). “Alternative high”: Raising the bar on public education. Retrieved from http://www.truth-out.org/news/item/24793-alternative-high-raising-the-bar-on-public-education
    Beck, I., McKeown, M., & Kucan, L. (2002). Bringing words to life: Robust vocabulary instruction. New York, NY: Guilford Press.
    Beck, I., McKeown, M., & Kucan, L. (2008). Bringing words to life: Robust vocabulary instruction (
    ed.). New York, NY: Guilford Press.
    Becker, A. (2013). Journey. Somerville, MA: Candlewick Press.
    Becker, A. (2014). Quest. Somerville, MA: Candlewick Press.
    Bjork, R. (2012). Desirable difficulties: Slowing down learning [Video]. Retrieved from https://www.youtube.com/watch?v=gtmMMR7SJKw&feature=youtu.be
    Black, P., Burkhardt, H., Daro, P., Lappan, G., Pead, D., & Stephens, M. (2011). High-stakes examinations that support student learning: Recommendations for the design, development and implementation of the PARCC assessments (ISDDE Working Group on Examinations and Policy). Retrieved from http://www.mathshell.org/papers/pdf/ISDDE_PARCC_Feb11.pdf
    Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 774.
    Bloom, B. S. (1968). Learning for mastery. Evaluation Comment (UCLA-CSIEP), 1(2), 112.
    Bloom, B. S. (Ed.), Englehardt, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956).The taxonomy of educational objectives, the classification of educational goals, handbook I: Cognitive domain. New York, NY: David McKay.
    Brennan, R. L. (Ed.). (2006). Educational measurement (
    ed.). West Port, CT: American Council on Education and Praeger.
    Brookhart, S. (2009, November). The many meanings of “multiple measures.” Educational Leadership, 67(3), 612.
    Brown, P., Roediger, H. L., & McDaniel, M. A. (2014). Make it stick: The science of successful learning. Cambridge, MA: Belknap Press.
    Brulles, D., Brown, K., & Winebrenner, S. (2016). Differentiated lessons for every learner: Standards-based activities and extensions for middle school. Waco, TX: Prufrock Press.
    Center for Collaborative Education. (2012). Quality performance assessment: A guide for schools and districts. Boston, MA: Author.
    Cherry, K. (2016, June 22). What is schema in psychology? Retrieved from https://www.verywell.com/what-is-a-schema-2795873
    Clark, B. (1983). Growing up gifted (
    ed.). Columbus, OH: Charles E. Merrill.
    Clark, J. (2003). Changing systems to personalize learning. Providence, RI: The Education Alliance at Brown University.
    Clements, D. H., & Sarama, J. (2009). Learning trajectories in early mathematics—Sequences of acquisition and teaching. In Canadian Language and Literacy Research Network, Encyclopedia of Language and Literacy Development (pp. 17). Retrieved from https://www.scribd.com/document/22929814/Trajectorias-Aprendizagem-Clemets-Sarama-2009
    Corcoran, T., Mosher, F. A., & Rogat, A. D. (2009). Learning progressions in science: An evidence-based approach to reform. Philadelphia, PA: Consortium for Policy Research in Education.
    Costa, A., & Kallick, B. (2008). Learning and leading with habits of mind. Alexandria, VA: ASCD.
    Danielson, C. (2013). The framework for teaching evaluation instrument. Princeton, NJ: The Danielson Group.
    Darling-Hammond, L. (2010). Performance counts: Assessment systems that support high-quality learning. Washington, DC: Council of Chief State School Officers and Stanford, CA: Stanford Center for Opportunity Policy in Education.
    Daro, P., Mosher, F. A., & Corcoran, T. (2011). Learning trajectories in mathematics: A foundation for standards, curriculum, assessment, and instruction. Consortium for Policy Research in Education. Retrieved from http://www.cpre.org/learning-trajectories-mathematics-foundation-standards-curriculum-assessment-and-instruction
    DePascale, C. (2011). Multiple measures, multiple meanings. Paper presented at the 2011 Reidy Interactive Lecture Series, Boston, MA.
    Dickenson, S. V., Simmons, D. C., & Kame’enui, E. J. (1998). Text organization: Research bases. In D. C. Simmons & E. J. Kame’enui (Eds.), What reading research tells us about children with diverse learning needs (pp. 239278). Mahwah, NJ: Erlbaum.
    Domaleski, C., Gong, B., Hess, K., Marion, S., Curl, C., & Peltzman, A. (2015). Assessment to support competency-based pathways. Washington, DC: Achieve. Retrieved from http://www.karin-hess.com/free-resources
    Driver, R., Squires, A., Rushworth, P., & Wood-Robinson, V. (2002). Making sense of secondary science: Research into children’s ideas. Abingdon, OX, England: RoutledgeFalmer.
    Duschl, R., Schweingruber, H., & Shouse, A. (Eds.), & Board on Science Education, Center for Education, & Division of Behavioral and Social Sciences and Education. (2007). Taking science to school: Learning and teaching science in Grades K–8. Washington, DC: The National Academies Press.
    Education Department of Western Australia. (1994). First steps: Oral language developmental continuum. Melbourne: Longman Australia.
    Edutopia. (2008). How should we measure student learning? 5 keys to comprehensive assessment with Linda Darling Hammond [Video]. Retrieved from https://www.edutopia.org/comprehensive-assessment-introduction
    EL Education. (2012). Austin’s butterfly: Building excellence in student work [Video]. Retrieved from https://vimeo.com/search?q=Austin%E2%80%99s+Butterfly
    EL Education. (2013). Citing evidence from informational and literary texts [Video]. Retrieved from https://vimeo.com/54871334
    Engle, J. (1988). Students questioning students: A technique to invite student involvement. Presentation at the Fifth Annual Forum in Gifted Education, Rutgers University, New Brunswick, NJ.
    Fisher, D., Frey, N., & Hattie, J. (2016). Visible learning for literacy: Implementing the practices that work best to accelerate student learning. Thousand Oaks, CA: Corwin.
    Flowers, C., Browder, D., Wakeman, S., & Karvonen, M. (2007). Links for academic learning: The conceptual framework. National Alternate Assessment Center (NAAC) and the University of North Carolina at Charlotte.
    Foorman, B. R. (2009). Text difficulty in reading assessment. In E. H. Hiebert (Ed.), Reading more, reading better (pp. 231250). New York, NY: Guilford Press.
    Francis, E. (2016). Now that’s a good question! Alexandria, VA: ASCD.
    Frey, N., Fisher, D., & Everlove, S. (2009). Productive group work: How to engage students, build teamwork, and promote understanding. Alexandria, VA: ASCD.
    Goertz, M. E. (2011). Multiple measures, multiple uses. Paper presented at the 2011 Reidy Interactive Lecture Series, Boston, MA.
    Great Schools Partnership. (2014) The glossary of education reform. Retrieved from http://edglossary.org/proficiency
    Gregory, G., & Kaufeldt, M. (2015). The motivated brain: Improving student attention, engagement, and perseverance. Alexandria, VA: ASCD.
    Hammond, W. D., & Nessel, D. (2011). The comprehension experience: Engaging readers through effective inquiry and discussion. Portsmouth, NH: Heinemann.
    Hattie, J. (2002, October). What are the attributes of excellent teachers? Presentation at the New Zealand Council for Educational Research Annual Conference, University of Auckland.
    Hattie, J. (2006, July). Large-scale assessment of student competencies. Paper presented as part of the Symposium, Working in Today’s World of Testing and Measurement: Required Knowledge and Skills (Joint ITC/CPTA Symposium), 26th International Congress of Applied Psychology, Athens, Greece.
    Hattie, J. (2009). Visible learning. A synthesis of over 800 meta-analyses relating to achievement. Abingdon, OX, England: Routledge.
    Hawai’i Department of Education. (2010). Learning progressions Hawaii progress maps [Video] (project described in Hess, 2011b). Tri-State Enhanced Assessment Grant. Minneapolis: University of Minnesota, National Center on Educational Outcomes (NCEO). Retrieved from http://youtu.be/8vltv2PaZVU
    Hess, K. (1987). Enhancing writing through imagery. Unionville, NY: Royal Fireworks Press.
    Hess, K. (2000). Beginning with the end in mind: A cross-case analysis of two elementary schools’ experiences implementing Vermont’s framework of standards and learning opportunities (Unpublished Dissertation). University of Vermont, Burlington.
    Hess, K. (2004a). Applying Webb’s depth-of-knowledge levels in reading and writing (White paper developed for the New England Common Assessment Program [NECAP]). Dover, NH: Center for Assessment.
    Hess, K. (2004b). Applying Webb’s depth-of-knowledge levels in mathematics (White paper developed for the New England Common Assessment Program [NECAP]). Dover, NH: Center for Assessment.
    Hess, K. (2008a). Applying Webb’s depth-of-knowledge levels in science (White paper developed for the New England Common Assessment Program [NECAP]). Dover, NH: Center for Assessment.
    Hess, K. (2008b). Developing and using learning progressions as a schema for measuring progress. Retrieved from http://www.karin-hess.com/learning-progressions
    Hess, K. (2008c). Teaching and assessing understanding of text structures across the grades: A research synthesis. Dover, NH: Center for Assessment.
    Hess, K. (2009). Student profile: Science inquiry learning Grades preK–5. In K. Hess, Linking research with practice: A local assessment toolkit. Underhill, VT: Educational Research in Action.
    Hess, K. (2010a). Using learning progressions to monitor progress across grades: A science inquiry learning profile for PreK–4. In Science & Children, 47(6), 5761.
    Hess, K. (Ed.). (2010b). Learning progressions frameworks designed for use with the Common Core State Standards in mathematics K–12. National Alternate Assessment Center at the University of Kentucky and the National Center for the Improvement of Educational Assessment. Retrieved from http://www.karin-hess.com/learning-progressions
    Hess, K. (Ed.). (2011a). Learning progressions frameworks designed for use with the Common Core State Standards for ELA & literacy, K–12. National Alternate Assessment Center at the University of Kentucky and the National Center for the Improvement of Educational Assessment. Retrieved from http://www.karin-hess.com/learning-progressions
    Hess, K. (2011b). Learning progressions in K–8 classrooms: How progress maps can influence classroom practice and perceptions and help teachers make more informed instructional decisions in support of struggling learners. (Synthesis Report 87). Minneapolis: University of Minnesota, National Center on Educational Outcomes. Retrieved from http://www.karin-hess.com/learning-progressions
    Hess, K. (2011c). Text-based assessment targets planning worksheets for Grades 3–12. Underhill, VT. Retrieved from http://www.karin-hess.com/free-resources
    Hess, K. (2012a). Using a research-based learning progression schema in the design of performance-based assessment tasks and interpretation of student progress. Invited presentation for Roundtable Discussion on Performance Assessment at the April 2012 AERA Annual Meeting, Vancouver, BC.
    Hess, K. (2012b). What is the role of common assessments in local assessment systems? (White paper developed for WY school districts). Underhill, VT: Educational Research in Action.
    Hess, K. (2013a). Linking research with practice: A local assessment toolkit to guide school leaders. Underhill, VT: Educational Research in Action.
    Hess, K. (2013b). Text complexity toolkit: A video workshop with Dr. Karin Hess [Video]. Underhill, VT: Educational Research in Action. Retrieved from http://www.karin-hess.com/free-resources
    Hess, K. (2014). Unit planning tools using the LPF: Strand 7. Underhill, VT: Educational Research in Action, LLC.
    Hess, K. (2015). Linking research and rigor: A video workshop with Dr. Karin Hess [Video]. Underhill, VT: Educational Research in Action. Retrieved from http://www.karin-hess.com/free-resources
    Hess, K. (2016). Student work to DIE for: A video workshop with Dr. Karin Hess [Video]. Underhill, VT: Educational Research in Action. Retrieved from http://www.karin-hess.com/free-resources
    Hess, K. (2017). Piloting new assessments using a cognitive labs approach: A video workshop with Dr. Karin Hess [Video]. Underhill, VT: Educational Research in Action. Retrieved from http://www.karin-hess.com/free-resources
    Hess, K., & Biggam, S. (2004). A discussion of “increasing text complexity” Grades K–HS (White paper published by NH, RI, and VT Departments of Education as part of the New England Common Assessment Program (NECAP) Grade Level Expectations for Reading). Dover, NH: Center for Assessment.
    Hess, K., Burdge, M., & Clayton, J. (2011). Challenges to developing alternate assessments. In M. Russell (Ed.), Assessing students in the margins: Challenges, strategies, and techniques. Charlotte, NC: Information Age Publishing.
    Hess, K., Carlock, D., Jones, B., & Walkup, J. (2009). What exactly do “fewer, clearer, and higher standards” really look like in the classroom? Using a cognitive rigor matrix to analyze curriculum, plan lessons, and implement assessments. Retrieved from http://www.karin-hess.com/free-resources
    Hess, K., Darling-Hammond, L., Abedi, J., Thurlow, M., Hiebert, E. H., Ducummun, C. E., et al. (2012). Content specifications for the summative assessment of the Common Core State Standards for English Language Arts and literacy in history/social studies, science, and technical subjects. Santa Cruz: University of California Santa Cruz Silicon Valley Extension, SMARTER Balanced Assessment Consortium.
    Hess, K., & Gong, B. (2014). Ready for college and career? Achieving the Common Core Standards and beyond through deeper, student-centered learning. Quincy, MA: Nellie Mae Education Foundation. Retrieved from http://www.karin-hess.com/free-resources
    Hess, K., & Gong, B. (2015). An alignment study methodology for examining the content of high-quality summative assessments of college and career readiness in English language arts/literacy and mathematics (Unpublished white paper). Dover, NH: Center for Assessment.
    Hess, K., & Hervey, S. (2010). Tools for examining text complexity (White paper). Dover, NH: Center for Assessment.
    Hess, K., Kurizaki, V., & Holt, L. (2009). Reflections on tools and strategies used in the Hawai’i progress maps project: Lessons learned from learning progressions. Final Report, Tri-State Enhanced Assessment Grant. Minneapolis: University of Minnesota, National Center on Educational Outcomes (NCEO). Retrieved from https://nceo.umn.edu/docs/tristateeag/022_HI%201.pdf
    Hess, K., McDivitt, P., & Fincher, M. (2008). Who are the 2% students and how do we design test items and assessments that provide greater access for them? Results from a pilot study with Georgia students. Atlanta, GA: Tri-State Enhanced Assessment Grant. Retrieved from http://www.nciea.org/publications/CCSSO_KHPMMF08.pdf
    Hiebert, E. H. (2012). Readability and the Common Core’s Staircase of Text Complexity (Text Matters 1.3) and The Text Complexity Multi-Index (Text Matters 1.2). Retrieved from http://textproject.org/professional-development/text-matters
    Hiebert, E. H. (2013). Supporting students’ movement up the staircase of text complexity. Reading Teacher, 66(6), 459467.
    Hillocks, G. (2011). Teaching argument writing Grades 6–12. Portsmouth, NH: Heinemann.
    International Society for the Scholarship of Teaching & Learning/ISSTL. (2013). Studying and designing for transfer: What is transfer? [Video]. Retrieved from http://blogs.elon.edu/issotl13/studying-and-designing-for-transfer
    Jabot, M., & Hess, K. (2010). Learning progressions 101 (Modules Addressing Special Education and Teacher Education [MAST]). Greenville, NC: East Carolina University. Retrieved from http://mast.ecu.edu/modules/lp
    Kagan, S. (1992). Cooperative learning. San Juan Capistrano, CA: Resources for Teachers.
    Keely, P. (2008). Science formative assessment: 75 practical strategies for linking assessment, instruction, and learning. Thousand Oaks, CA: Corwin.
    Keeley, P., Eberle, F., & Farrin, L. (2005). Uncovering student ideas in science: Vol. 1. 25 formative assessment probes. Arlington, VA: NSTA Press.
    Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development (Vol. 1). Englewood Cliffs, NJ: Prentice-Hall.
    Maine Department of Education. (2000). Measured measures: Technical considerations for developing a local assessment system. Augusta, ME: Maine Department of Education, Maine Comprehensive Assessment System Technical Advisory Committee.
    Maloney, A., & Confrey, J. (2013, January 24–26). A learning trajectory framework for the mathematics common core: Turnonccmath for interpretation, instructional planning, and collaboration. Presentation at the 17th Annual Conference of the Association of Mathematics Teacher Educators, in Orlando, FL. Retrieved from https://www.fi.ncsu.edu/wp-content/uploads/2013/05/A-Learning-Trajectory-Framework-presentation.pdf
    Maloney, A. P., Confrey, J., & Nguyen, K. H. (Eds.). (2014). Learning over time: Learning trajectories in mathematics education. Charlotte, NC: Information Age Publishing.
    Marston, E. (2005). The lost people of Mesa Verde (NECAP support materials with permission of Highlights for Children). Retrieved from http://www.narragansett.k12.ri.us/Resources/NECAP%20support/gle_support/Reading/end7/the_lost_people.htm
    Marzano, R. J. (2012). Teaching argument. Educational Leadership, 70, 8081.
    Masters, G., & Forster, M. (1996). Progress maps (Part of the Assessment Resource Kit, pp. 158). Melbourne: The Australian Council for Educational Research.
    McCarthy, B. (1987). The 4MAT System: Teaching to learning styles with right/left mode techniques. Barrington, IL: Excel.
    McKenna, M., & Stahl, S. (2003). Assessment for reading instruction. New York, NY: Guilford Press.
    McLeod, S. A. (2013). Kolb—Learning styles. Retrieved from https://www.simplypsychology.org/learning-kolb.html
    McTighe, J., & Wiggins, G. (1999, 2004). Understanding by design professional development workbook. Alexandria, VA: ASCD.
    Moore, C., Garst, L., & Marzano, R. (2015). Creating and using learning targets and performance scales: How teachers make better instructional decisions. West Palm Beach, FL: Learning Sciences International.
    Moss, C., & Brookhart, S. (2012). Learning targets: Helping students aim for understanding in today’s lesson. Alexandria, VA: ASCD.
    National Center and State Collaborative. (2013). Learning progressions frameworks [Video]. Retrieved from https://www.youtube.com/watch?v=ss8fE1dBkE4&t=24s
    National Center and State Collaborative/NCSC. (2015, December). NCSC’s content model for grade-aligned instruction and assessment: “The same curriculum for all students.” Retrieved from http://www.ncscpartners.org/Media/Default/PDFs/Resources/NCSCBrief7.pdf
    National Center for Learning Disabilities. (2005). Executive functioning fact sheet. Retrieved from www.ldonline.org/article/24880
    National Governors Association Center for Best Practices & Council of Chief State School Officers. (2010a). Common Core State Standards for English language arts & literacy in history/social studies, science, and technical subjects: Appendix A. Washington, DC: Authors.
    National Governors Association Center for Best Practices & Council of Chief State School Officers. (2010b). Common Core State Standards for English language arts & literacy in history/social studies, science, and technical subjects. Washington, DC: Authors.
    National Governors Association Center for Best Practices & Council of Chief State School Officers. (2010c). Common Core State Standards for mathematics. Washington, DC: Authors.
    National Research Council. (2012). Education for life and work: Developing transferable knowledge and skills in the 21st century. Washington, DC: The National Academies Press.
    National Research Council, Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.
    Newmann, F., King, M., & Carmichael, D. (2007). Authentic instruction and assessment: Common standards for rigor and relevance in teaching academic subjects. Des Moines: Iowa Department of Education. Retrieved from https://www.centerforaiw.com/phases-of-aiw
    Nottingham, J. A., Nottingham, J., & Renton, M. (2017). Challenging learning through dialogue: Strategies to engage your students and develop their language of learning. Thousand Oaks, CA: Corwin.
    OGAP. (2008, January 18). The Vermont Mathematics Partnership, U.S. Department of Education (Award Number S366A020002) & National Science Foundation (Award Number EHR-0227057). Retrieved from http://www.ogapmath.com
    O’Keefe, P. A. (2014, September 5). Liking work really matters. NY Times. Retrieved from https://www.nytimes.com/2014/09/07/opinion/sunday/go-with-the-flow.html?_r=3
    Pellegrino, J. W. (2002). Understanding how students learn and inferring what they know: Implications for the design of curriculum, instruction and assessment. In M. J. Smith (Ed.), NSF K–12 mathematics and science curriculum and implementation centers conference proceedings (pp. 7692). Washington, DC: National Science Foundation and American Geological Institute.
    Perkins, D., & Salomon, G. (1988, September). Teaching for transfer. Educational Leadership, 46(1), 2232.
    Pinnell, G. S., & Fountas, I. (2007). The continuum of literacy learning Grades K–8: Behaviors and understandings to notice, teach, and support. Portsmouth, ME: Heinemann.
    Porter, A., & Smithson, J. (2001). Defining, developing, and using curriculum indicators: CPRE Research Report Series RR-048. Philadelphia: University of Pennsylvania, Consortium for Policy Research in Education.
    Roediger, H. L., & Marsh, E. J. (2005, September).The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(5), 11551159.
    Rose, C., Minton, L., & Arline, C. (2007). Uncovering student thinking in mathematics: 25 formative assessment probes. Thousand Oaks, CA: Corwin.
    Schmidt, W. H., Wang, H. A., & McKnight, C. C. (2005). Curriculum coherence: An examination of U.S. mathematics and science content standards from an international perspective. Journal of Curriculum Studies, 37(5), 525529.
    Shin, N., Stevens, S., Short, H., & Krajcik, J. (2009, June). Learning progressions to support coherence in instructional material, instruction, and assessment design. Paper presented at the Learning Progression in Science (LeaPS) Conference, Iowa City, IA.
    Shwartz, Y., Weizman, A., Fortus, D., Krajcik, J., & Reiser, B.J. (2008). The IQWST experience: Using coherence as a design principle for a middle school science curriculum. Elementary School Journal, 109(2), 199219.
    Slavin, R. (1991). Synthesis of research on cooperative learning. Educational Leadership, 48(5), 7182.
    Smarter Balanced Assessment Consortium. (2017). SBAC Mathematics Task Specifications. Retrieved from http://www.smarterbalanced.org
    Sousa, D. A. (2015). Brain-friendly assessments: What they are and how to use them. West Palm Beach, FL: Learning Sciences International.
    Stiggins, R. (1997). Student-involved classroom assessment (
    ed.). Upper Saddle River, NJ: Prentice-Hall.
    Stiggins, R. (2017). The perfect assessment system. Alexandria, VA: ASCD.
    Sturgis, C., & Patrick, S. (2010, November). When success is the only option: Designing competency-based pathways for next generation learning. Quincy, MA: Nellie Mae Education Foundation.
    Teachers College Reading Writing Project. (2013). Learning progression to support self-assessment and writing about themes in literature: Small group [Video]. Retrieved from https://www.youtube.com/watch?v=8grZFus5OCo
    Thompson, S. J., Johnstone, C. J., & Thurlow, M. L. (2002). Universal design applied to large scale assessments (Synthesis Report 44). Minneapolis: University of Minnesota, National Center on Educational Outcomes. Retrieved from http://education.umn.edu/NCEO/OnlinePubs/Synthesis44.html
    Tucker, C. (2015, February). Thesis statement throwdown! Retrieved from http://catlintucker.com/2015/02/thesis-statement-throwdown
    Understanding Language/Stanford Center for Assessment, Learning, & Equity. (2016, June). Evaluating item quality in large-scale assessments: Phase I report of the study of state assessment systems. Stanford, CA: Author.
    Vacca, R. T., & Vacca, J. A. (1989). Content area reading (
    ed.). New York, NY: HarperCollins.
    Vermont Agency of Education. What is proficiency-based learning? Retrieved from http://education.vermont.gov/student-learning/proficiency-based-learning
    Vygotsky, L. S. (1978). Mind and society: The development of higher mental processes. Cambridge, MA: Harvard University Press.
    Walsh, J. A., & Sattes, B. D. (2015). Questioning for classroom discussion. Alexandria, VA: ASCD.
    Walsh, J. A., & Sattes, B. D. (2017). Quality questioning (
    ed.). Thousand Oaks, CA: Corwin.
    Webb, N. (1997). Criteria for alignment of expectations and assessments on mathematics and science education [Research Monograph Number 6]. Washington, DC: CCSSO.
    Webb, N. (2002, March 28). Depth-of-Knowledge levels for four content areas (White paper shared via personal email).
    Webb, N. (2005). Web alignment tool (WAT): Training manual. Madison: University of Wisconsin and Council of Chief State School Officers.
    Wiggins, A. (2017). The best class you never taught: How spider web discussion can turn students into learning leaders. Alexandria, VA: ASCD.
    Wiggins, G. (2006). Healthier testing made easy: The idea of authentic assessment. Retrieved from https://www.edutopia.org/authentic-assessment-grant-wiggins
    Wiggins, G., & McTighe, J. (1999). The understanding by design handbook. Alexandria, VA: ASCD.
    Wiggins, G., & McTighe, J. (2005). Understanding by design (expanded
    ed.). Alexandria, VA: ASCD.
    Wiggins, G., & McTighe, J. (2012). The understanding by design guide to advanced concepts in creating and reviewing units. Alexandria, VA: ASCD.
    Wiliam, D. (2015). Designing great hinge questions. Educational Leadership, 73(1), 4044.
    Wiliam, D., & Leahy, S. (2015). Embedding formative assessment: Practical techniques for K–12 classrooms. West Palm Beach, FL: Learning Sciences International.
    Willingham, D. T. (2009). Why don’t students like school? A cognitive scientist answers questions about how the mind works and what it means for the classroom. San Francisco, CA: Wiley.
    Wilson, M., & Bertenthal, M. (Eds.). (2005). Systems for state science assessment. Board on Testing and Assessment, Center for Education, National Research Council of the National Academies. Washington, DC: National Academies Press.
    Wisconsin Department of Public Instruction. (2016). Wisconsin’s Strategic Assessment Systems Foundational Charts (revised). Madison, WI: Author. Retrieved from https://dpi.wi.gov/sites/default/files/imce/strategic-assessment/Strategic_Assessment%20CHARTS.pdf

    • Loading...
Back to Top

Copy and paste the following HTML into your website