Educational Assessment: Tests and Measurements in the Age of Accountability


Robert J. Wright

  • Citations
  • Add to My List
  • Text Size

  • Chapters
  • Front Matter
  • Back Matter
  • Subject Index
  • Dedication

    This book is affectionately dedicated to my wife, Jeanne. Her intellect and energy are a constant inspiration to me. Jeanne is my ultimate editor, tireless helper, and best friend.


    View Copyright Page


    Commission and Nisus

    My goal in writing this book is to provide students with an understandable and useful interpretation of the critical information related to educational measurement. There has never been a time when it has been more important for educators to have an understanding of educational assessment and measurement. Educational accountability has taken root in our nation, and our political leaders are all looking for “scientific” documentation of our successes. It is no longer prudent or even possible for educators to ignore this national Zeitgeist.

    Our political leaders have decided that the high-stakes testing of public school children is the best way to provide this documentation. For that reason, educational testing now provides one unifying theme in the lives of all American educators. Today the testing industry in this country has literally become a billion-dollar business. No longer is it just the students who can “flunk” a test. In today's schools, educational measurements are all too often the driving force behind curriculum decisions, hiring practices, salary and benefit packages, and the politics of school boards. The annual release of average test scores to the news media has changed how schools operate and what goes on in the classrooms.

    Game Plan

    During the thirty years that I taught educational testing to graduate students, I was never completely comfortable with the textbooks that were available. It seemed that every year I adopted a different textbook, and every year I was never completely satisfied with my choice. It was my impression that the books were either too pedantic and designed to make students dislike this exciting field, or they were vacuous and devoid of the central core of knowledge required to develop the next generation of educational specialists and leaders.

    This new book is the outcome of my quest for a readable and highly engaging textbook that does not compromise the core principles of measurement. In that regard, there are chapters covering traditional topics such as reliability, standard error of measurement, validity, classroom test construction, performance assessments, standardized achievement tests, item analysis, and the application of Item Characteristic Curves (ICC) for Differential Item Functioning (DIF).

    Beyond these traditional concerns, this book is grounded in the real world of public schools and students. It is not designed for psychology or sociology courses but is targeted to meet the needs of educators and future educational leaders. This school-based focus was accomplished in several ways. As a case in point, a school focus pervades the numerous examples that describe the implications of measurement decisions on the lives of students and teachers. As another, more than 90% of the 860 references cited in this book are from either the educational literature or education-focused agencies.

    Heretofore, textbooks in educational measurement have lost students in scientific discussions of the arcane and complex principles of psychometrics. Unlike what has gone before, this text provides an engaging, insightful, and highly readable introduction to the inner workings of educational measurement. Traditional topics are presented in approachable and understandable ways. This book employs an issue-oriented approach to the analysis and interpretation of complex measurement concerns, most of which are being debated in public forums today. As an example, this book examines issues such as the score gap, high-stakes tests and the dropout crisis, and the problem of grade retention versus social promotion. Intriguing real-life examples are introduced in each chapter, selected to demonstrate how the technical measurement principles actually affect those who are involved. The approach I used to write this textbook involved drawing from my public school experiences and years as a teacher-educator and then amalgamating my experiences into a narrative presentation designed to explain the science of educational measurement. This provides the text with a true school-based focus.

    The grounding of this work in the real world of public education is evident from the book's emphasis on matters such as high-stakes testing. The narrative also presents the position that the justification for much that has occurred in the name of educational accountability was originally based on erroneous readings and interpretations of international academic achievement assessments. The point is also made that during the 1980s the concern for American education led inevitably to the Improving America's Schools Act (1994) and then to the No Child Left Behind Act (NCLB, 2002).

    In this new book, the testing mandates of the NCLB Act are presented in their historical context. In this historical interpretation, they are seen as a continuation of European social science traditions begun during the Victorian era. This approach to measurement-based accountability as prescribed by the NCLB mandates follows a psychometric model that was first proposed in 1923 by Lewis M. Terman. Today's high-stakes assessments are an extension of the American belief in an industrial model for management, which requires an “input-output” evaluation. This approach focuses on observable outputs and products and pays little attention to the processes or the initial condition of raw materials.

    Another description presented in the text elaborates on the link between the No Child Left Behind Act and the so-called miracle of Texas. Additionally, the NCLB Act is discussed in the context of the goals of the original Elementary and Secondary Education Act (1965) and that legislation's mandate for a longitudinal educational evaluation, “The Nation's Report Card” (NAEP).

    The book includes a lengthy discussion of ethical and unethical approaches used to improve scores on the mandated assessments. It also addresses the relationship between factors such as the child's personological characteristics, familial structures, teacher background, instructional approaches, and the leadership style of the principal and the outcome scores on mandated assessments.

    This text also addresses the needs and problems incurred by students with significant learning problems who must take the mandated assessment tests. The difficulty of reconciling special education programs with the tenants of the No Child Left Behind law are explored in depth. The steps in the identification of significant learning problems and interventions that are called for as part of the special education entitlements are also described in some detail. Related issues such as the testing of English Language Learners and the NCLB mandate for the testing of four-year-old inner-city children are also included.

    Administrative concerns linked to educational assessment are not ignored. I have included a chapter on school evaluation that includes value-added assessments and longitudinal data management. The book examines the traditional approaches used to integrate both formative and summative assessments into a “systematic educational program evaluation.” The chapter on evaluation also presents examples illustrating the design of an evaluation section for an application for external funding or subvention.

    “Hot button” problems for school leaders such as the relationship between high-stakes testing and grade retention, the dropout problem, and even “academic red shirting” are part of this text. Also discussed are the concerns held by many high school administrators regarding “academic press” and the striving of students to be one of the “deserving ten” or even the one with the best GPA who is selected to give the valedictory speech. In this context, the text provides an examination of the grading policies associated with gifted students, AP classes, and even the report card options for children with disabilities.

    Finally, the application of educational technology in testing and measurement is explored, and the future trends for the applications of educational technology are noted and discussed. A section is included describing the use of computer adaptive testing (CAT) and its potential impact on the achievement score gap with students who are part of ethnic minorities. This description includes a discussion of the problems associated with item development and test security. Other technological applications for measurement are also explored, including classroom clickers and real-time formative evaluations, on-line parent-teacher communications, and online grade books.

    Helpful Apparatus

    I have included a number of learning aids throughout this textbook. They have been incorporated in an effort to improve the quality and depth of learning that occurs for the reader.

    • Section descriptions running a page or two lead off each conceptual segment of the book. These section descriptors provide an overview and framework for the next few chapters.
    • A page or two at the start of every chapter provides a statement of “Issues and Themes.” This section provides the reader with insight into the perspective that I have taken in researching and writing the chapter.
    • The learning goals for each chapter following the Issues and Themes section are presented in a sequential list. Taken together these three elements provide the reader with a set of advanced organizers designed to provide him or her with an awareness of what is being learned and a structure for understanding how to interpret the new material.
    • Each chapter includes several sections labeled “Case in Point.” Each Case in Point provides a real-life example or application of the material presented in the text. This is included as a motivational component.
    • Another motivational device in every chapter is a cartoon addressing an aspect of the chapter's narrative. These cartoons were drawn by some of America's leading cartoon humorists, including those of the New Yorker.
    • A total of 118 URLs are included in the chapters of the book. These Internet resources provide students with access to an expanded library of information associated with the material in the book.
    • Each chapter provides a list of discussion questions designed to initiate classroom discussions and motivate students to employ higher-order cognitive processes when considering the material herein.
    • A detailed glossary of terminology including over 600 technical terms and laws is also part of this book.
    • Over 860 references, including another 250 URLs, are included as support material for the book.
    A Word to Instructors

    Educational tests and measurements constitute a topic that is now central to the preparation of educational leaders and specialists. It is also a germane topic for all classroom educators working in our public schools. For that reason I advocate using this book with classes that are composed of graduate and upper-division students in education and teacher preparation. Students who would gain the most from this book are first-year graduate students and sophisticated undergraduate students.

    I also encourage you to review and consider the “Teaching on Point” components included on the Instructor's CD. They provide teaching and discussion ideas that are keyed to the real-world examples, known as “Case on Point” in each chapter (as mentioned above). That Instructor's CD also provides a multiple choice test (type-A format) for each chapter. I have also provided discussion questions for each chapter, which can spark classroom or seminar interactions and promote analytical and evaluative thinking.

    Finally, I encourage you to contact me with your ideas and thoughts about this text. I am also open to interacting with students during one of your classes using Internet video technology or by conference call. E-mail me at to explore this further. (Include the word textbook as part of the subject line.)

    Congratulations for electing to teach this course, and best wishes for a successful semester.

    Robert J.Wright, Ph.D. Professor, Center for Education Widener University
    A Word to the Student

    I accept the fact that very few students who enroll in collegiate and graduate courses in educational tests and measurements do so voluntarily Yet, the study of educational measurement can be an engaging and truly empowering experience for teachers and school leaders.

    From my perspective, there are four things about any college class or graduate seminar that are central to its success. These include the students who take the course, the instructor who provides the class, the content of the discipline being studied, and the books and resources that are available to the students.

    During the intersessions of my university I have heard colleagues comment on how much easier it is to be a professor when the students are away from campus. However, to be a professor is to profess to others. Students are the reason colleges exist, and students are the life force inspiring each of us who enter the lecture halls and classrooms. The only things a college instructor asks for are students who have an open mind to the discipline and who are truly willing to learn.

    I took my first class in educational measurement as a graduate student in 1968. In that summer course, I was inspired by a good teacher. That experience eventually led me to earn my doctorate in educational psychology. Good teaching is always the key to success for every course.

    Beyond the ability of an instructor to inspire and motivate students, the next most important component of a good learning experience is the topic itself. Educational testing and measurement is now the central focus of ongoing public policy debates and is also an ever-present concern for all public school educators. It is the “tests” that have become the bane of so many teachers and administrators. The continuing presence of mandated assessment programs in the schools is not debatable. There can be no argument that they will be with us for the foreseeable future. Having a solid foundation in the knowledge about educational measurement is therefore a matter of survival for those who earn their living in the front offices and classrooms of our schools. The topics that a course in educational tests and measurement presents are seen far beyond the lecture hall: These issues and concerns are seen throughout our popular culture, in our mass media, and even in the value of our real estate. Today it is axiomatic that no prudent educator can take the topic of educational tests and measurement lightly.

    I have written this book with these concerns in mind. The text provides grounding in all of the aspects of measurement that a public school educator must have as background and goes on to provide insight into the inner workings of the state agencies involved in creating high-stakes educational measurements. I have worked to write this book in a highly readable and engaging style. By reading and studying the material herein, you will gain the ability to maximize the test outcomes for your students and be better equipped to advocate for improved approaches to the measurement of learning.

    To the Profession

    A diligent and documented search has been made to assure that the copyright holders of all material included in this work have been contacted and have approved use here. Also, all cited reference materials have been checked for accuracy. The URLs listed in this book have been carefully examined for accuracy as well. If the reader finds that errors have been made in some of these efforts, please contact me at //

    Supplemental Materials

    Additional ancillary materials further support and enhance the learning goals of Educational Assessment: Tests and Measurements in the Age of Accountability. These ancillary materials include the following:

    Instructor's Resources CD

    This CD offers the instructor a variety of resources that supplement the book material, including PowerPoint lecture slides, Web resources, “Teaching on Point” ideas to accompany the “Case in Point” discussions in each chapter, ideas for class discussions and projects, and sample syllabi for both quarter and semester courses. The CD also includes Brownstone's Diploma Test Bank software so that instructors can create, customize, and deliver tests. The Test Bank consists of 20 multiple choice questions, 10–15 true/false questions, 10 short answer questions, and 5 essay questions for each chapter. Answers and page references are provided for each question.

    Web-Based Student Study Site

    This Web-based student study site provides a variety of additional resources to enhance students' understanding of the book content and to take their learning one step further. The site includes comprehensive study materials such as chapter objectives, flash cards, and practice tests. The site also includes the following special features: Web resources, “Learning From Journal Articles,” and “Considerations on Point” discussions to accompany each chapter's “Cases in Point.”


    No project of this magnitude can be completed by one person working in isolation. This textbook would not have been possible without the encouragement, editorial assistance, intelligence and creativity, and freely given help of my wife and partner, Jeanne.

    In addition I must provide a word of thanks to Ms. Molly Wolf, Education Collection Librarian at Widener University's Wolfgram Memorial Library; Gloria Floyd, my secretary; the librarians and assistants at the Bonita Springs, Florida, Library; and Drs. Richard and Ann St. John, my technology support team, critics, and friends.

    Kudos are also due for the team at Sage Publications, including my editor, Dr. Diane McDaniel, whose belief in this project and support of my efforts has been invaluable. Also, I owe a special debt of gratitude to Ms. Ashley Plummer, the editorial assistant working with this manuscript. It was Ashley who coordinated this project, bringing together the many reviews and permissions that are part of this book. I also wish to express my gratitude to the many editorial reviewers who worked to make this the best possible textbook. This group of faculty reviewers includes the following:

    • Morris I. Beers, State University of New York College at Brockport
    • Tyrone Bynoe, University of the Cumberlands
    • Ollie Daniels, Barry University
    • Delisa K. Dismukes, Jacksonville State University
    • Holmes Finch, Ball State University
    • Sheryl R. Glausier, Southeastern Louisiana University
    • Renée Í. Jefferson, The Citadel: The Military College of South Carolina
    • Kae Keister, Wilmington College
    • James Pelech, Benedictine University
    • Kaye Pepper, University of Mississippi
    • Thomas R. Scheira, Buffalo State College
    • Thomas J. Sheeran, Niagara University
    • John Shimkanin, California University of Pennsylvania
    • Ellen Bennett Steiner, University of Denver
    • John J. Venn, University of North Florida
    • Colleen Willard-Holt, Penn State-Harrisburg
  • Glossary

    • a priori test assessment Conducted prior to using a new test. It provides a method of assuring quality and the validity of the measure.
    • ability grouping Placement of children into groups based on results from standardized tests of mental ability, aptitudes, and achievement.
    • academic redshirting Describes when parents of a young child elect not to enroll that child in school until he or she is a year older than the normal cohort of kindergarten children. This is done to provide the child with a developmental advantage over his or her peers.
    • accommodations Modifications in the testing or evaluation environment that are made to compensate for a child's disabling condition, which make the test more fair and inferences made on the basis of its scores more accurate.
    • accomplishment ratio A quantitative system for determining how efficacious the learning process is for a child. It is a statement combining the child's cognitive ability and achievement level into a single score. High scores show efficient learning, and low scores are indicative of underachievement.
    • accountability Being held to account for both the expenditure of educational funds and for the achievement outcomes for students.
    • achievement test A test measuring an individual's knowledge of specific facts and his or her proficiency in completing cognitive processes such as problem solving.
    • adequate yearly progress (AYP) AYP specifies the proportion of a school's students who achieve at a proficient or better level each year, as well as the attendance rates for the mandated tests, and the eventual graduation rates for the schools.
    • advanced placement A special advanced high school curriculum for as many as 27 different subject areas and a testing program that makes it possible for colleges to award collegiate credit for completing the AP course and passing the examination.
    • affirmative action A plan designed to increase the representation of traditionally underrepresented minorities in a program or activity.
    • alternate form reliability The correlation that can be obtained when two different but well-matched versions of a measure are administered to the same subjects.
    • alternative answers Alternatives are the possible answers to multiple choice questions and are composed of one correct answer (keyed response) and several wrong choices (distracters).
    • alternative assessments All extended-answer supply-type test items including a wide variety of assessment approaches that do not involve having the student select or choose between answer options (e.g., performance tasks, demonstrations, projects, and exhibitions).
    • alternative tracks Alternative routes for people to become teachers; initiated by the states to help meet the need for certified teachers.
    • American College Testing program (ACT) The American College Testing program is an alternative college admissions and guidance instrument used by one million students a year.
    • analytical assessment An approach to evaluation that uses a multifaceted rubric to examine a number of different aspects of the student product being assessed.
    • analytical intelligence One of the three types of intelligence that are part of Sternberg's proposed triarchic model of mental ability.
    • analytical rubric When a rubric assesses multiple aspects or dimensions of a performance by employing several ordinal scales elaborated as separate scoring rubrics. These can also be summed into a total overall score.
    • anchor items Test items used every year that serve to set the difficulty level of newly developed test items for subsequent editions of the assessment. Most high-stakes tests consist of up to 33% anchor items.
    • anecdotal observation Factual reporting of detailed behavior(s) as witnessed.
    • anecdotal record Ongoing linear report of an occurrence or incident. A written description of observations, devoid of judgments or evaluation.
    • answer The response made by a student to a query posed by the teacher.
    • Apgar In 1952, Dr. Virginia Apgar used her name as an acronym for the five subtests of her new measure of the status of newborn babies. It represented: A = appearance, P = pulse, G = grimace, A = activity, R = respiration. Today, the name Apgar is applied to an observational scale regarding the following factors: heart rate, respiratory effort, muscle tone, reflex response, and color.
    • appropriateness Component of validity referring to the application that the test is given. A particular test may be valid in one arena but inappropriate when used for another measurement task.
    • aptitude measures Assessment of the capacity for learning.
    • Army Alpha & Army Beta The first group-administered, paper-and-pencil tests of general intelligence administered to Army recruits beginning in 1917.
    • artificial intelligence The ability of a computer to perform those activities normally thought to require human intelligence.
    • assembly-line grading A name applied to the large-scale grading approach adopted by several universities to evaluate thousands of undergraduate essays by using graduate assistants and distributing the grading task over the Internet.
    • assessment The measurement of one or more variables related to the current condition, ability, status, or knowledge of an individual.
    • attention deficit disorder (ADD) Developmentally inappropriate lack of attention and focus on tasks and activities.
    • attention deficit/hyperactivity disorder (AD/HD)
    • Disorder of a child's ability to attend to and focus on learning that may appear prior to school age and be manifested by impulsivity; high, randomly directed activity; and inattention to tasks and directions.
    • authentic assessment Performance assessment task based on a “real world” problem or task that is based on issues encountered in the world beyond the school.
    • basic interpersonal communication skill (BICS) The everyday level of language used in social settings. The language used by children on the playground and in the lunchroom or in sports.
    • bell curve Also known as the Gaussian normal curve. This symmetric curve represents the distribution of error around the average of a large set of independent observations.
    • benchmark skills in reading Operational descriptions and examples that highlight qualitative differences between achievement levels and serve as marker points on a rubric.
    • benchmarks A point of reference based on a standard for learning, including illustrative examples of successful completion or achievement.
    • best linear unbiased estimators (BLUE) Regression analysis that weights variables so as to create the best linear unbiased estimators for minimizing the error variance of prediction. The analysis assumes the core regression assumptions (Gauss-Markov) are met.
    • best linear unbiased predictors (BLUP) Random variables (those without fixed properties) used in regression analysis to estimate the outcome for another random variable. The matrix of predictors (i.e., estimators) is referred to as the BLUP.
    • binomial distribution A distribution of a large number of independent outcomes from a series of binomial occurrences.
    • Bloom's taxonomy Taxonomy for categorizing the level of abstraction required to answer questions that commonly occur in educational settings. A useful structure for categorizing test questions according to the cognitive skill level required to answer them correctly.
    • bluffing Guessing an answer for an extended supply-type question on an essay test, made in the hope that the person grading the question will be lenient.
    • Buckley Amendment The Family Education Rights and Privacy Act is often known by the name of the senator who was instrumental in its passage, James L. Buckley. This law provides rules for educational record keeping, psychological testing, and parental rights.
    • CAVD The four factors of E. L. Thorndike's model for mental ability: completion, arithmetic, vocabulary, and directions.
    • Carnegie Unit A unit of instruction that requires 120 clock hours to present. This time may occur in units of 45 minutes for 180 school days. Block schedules can provide a Carnegie unit of instruction in time blocks of 90 minutes for 90 days.
    • centered A term used by the Educational Testing Service to describe the process of establishing a new normative reference group of subjects for use when scoring individuals’ tests. The process is sometimes referred to as “norming the test.”
    • central tendency A measure of the “average” performance of individuals, determined as the arithmetical center of the data.
    • chance level The likelihood that a test taker will be able to guess the correct answer to a select-type question (e.g., multiple choice question). This is linked to the number of answer alternatives.
    • child study movement This era began in the 19th century when the methods associated with modern science were applied to the systematic study of human infancy and childhood.
    • class rank The absolute position, or rank, of a student compared to his or her peers in terms of total grade point average.
    • classical measurement theory The theoretical model explaining the relationship between the obtained score of a subject on a test and the amounts of measurement error and true score (a pure score, free of any error).
    • coefficient Kappa The coefficient of reliability used with criterion-referenced tests. Scores range from a minimum of 0.00 to a maximum value of 1.00.
    • cognitive ability A synonym for traditionally defined mental ability that emphasizes verbal, spatial, and mathematical reasoning ability.
    • cognitive academic language proficiency (CALP) The level of language that is required for formal academic learning in content-based subject fields.
    • cohort A group of individuals having a statistical factor (age or background) in common.
    • College Board A not-for-profit corporation originally chartered as the College Entrance Examination Board in 1901 to create and score examinations for students applying for college admission.
    • Committee on Secondary School Studies A committee of the National Education Association existing from 1892 to 1893 that established the high school as a four-year experience involving 24 year-long classes. It is also known as the Committee of 10.
    • common school Name given to what are now known as public schools by Horace Mann. Supplanted in the American lexicon by public school around 1900.
    • commonality The proportion of a variable's variance that is shared with a factor. The amount of common variance between the two.
    • composite score A combined score from several subtests into a single score.
    • computer adaptive testing Testing that employs software that can estimate the ability and background knowledge of each test taker and guide the appropriate selection of test items from a computerized item bank.
    • computer grading of essays An application of artificial intelligence whereby computer software is used to evaluate and even grade the answers of students on extended supply-type (essay) questions.
    • concurrent strategies Metacognitive processes used to create understanding of written language (e.g., reading, using context clues, personal identification, and imagery).
    • concurrent validity The correlation between a known test, outcome, or measure and the scores on another measure of the same dimension.
    • confidence interval A parameter linked to a normal curve of probability. It provides two points representing the limits of what obtained scores would have been randomly drawn from a normal population.
    • connectionism A principle of learning holding that the acquisition of new behaviors is a function of their outcomes and that more efficient connections between learned behaviors occur as a function of the satisfaction achieved by their performance and repetition.
    • construct validity A validation technique used with variables of hypothetical traits or abilities lacking an operational (observable) definition. This involves demonstrating both the legitimacy of the variable and the measure of it.
    • constructed items Synonym for supply-type items, including questions requiring extended answers such as those in essay format.
    • constructed-response formats Testing formats that include completion, short answer items, essays, and compositions.
    • content validity The fidelity of the test items to the topic that was taught and/or the goals of the curriculum area being measured.
    • convergent production The mental process of bringing together diverse and sundry material needed to find the solution for a problem.
    • core curriculum The high school curriculum recommended by the Committee of 10, including a distribution of courses that involve both classical and modern languages, mathematics, sciences, geography, government, and history.
    • correction for guessing Deducting points from students’ scores equal to the number of items answered wrong divided by the number of distracters. Any item omitted by the test taker is not included in the process.
    • correlation A measure of the degree to which two variables are related.
    • correlation coefficient Any of a series of mathematical ratios (e.g., Pearson Product Moment Correlation Coefficient) that demonstrate the amount of variance shared by two variables. The numerical value is on a range of ±1.00 as the maximum and 0.00 as the minimum.
    • covariance The amount of common variance shared by two different measures. It is reported as a coefficient of correlation.
    • covariance analysis A statistical analysis of the difference between groups after a statistical correction for a priori group differences.
    • creative intelligence One of the three types of intelligence that make up the triarchic model for intelligence proposed by Robert Sternberg.
    • creativity The ability to use the imagination to organize ideas and make, design, or write unusual and highly productive products.
    • criterion-referenced test Tests designed to measure how well an individual has learned a specific skill or acquired specified knowledge. The reference is absolute and not dependent on a comparison to other test takers.
    • criterion variable A variable that is to be predicted in a multiple-correlation (multiple-regression) equation by two or more predictor variables.
    • critical thinking The skillful and disciplined use of the intellect in problem solving and/or creative production by going beyond the given and what is previously known and mentally exploring new dimensions or aspects of the task or problem.
    • Cronbach coefficient alpha A statistical estimation of test reliability or consistency that can be used with scaled and/or binary measures of a single dimension. This reliability is equivalent to the average of all possible permutations of split halves that can be made with a set of test item data from a sample of subjects.
    • cross-sectional model of evaluation The collection and analysis of data from different age groups to evaluate multiple levels of an educational system.
    • cross-validation An expression of both the statistical method and the outcome from that method. It is employed to determine the level of precision that a test or measure has when used to predict a future outcome. See predictive validity.
    • crystallized factor of ability That part of mental ability that is the product of what is learned, experienced, or absorbed from the culture.
    • curriculum-based assessment The combination of curriculum-based measurements with standardized achievement tests into a single assessment.
    • curriculum-based measurements Assessments conducted to identify problematic areas for the child by probing elements of the curriculum. These brief probes are conducted over time to provide a picture of the trend in the child's learning.
    • curriculum map A publicly available document that captures the scope and sequence (the school's master plan) of the learning objectives, activities, and assessments in each subject and in each grade.
    • curriculum probe Brief tests (5 to 10 min.) that are administered on a regular basis and are part of a curriculum-based measurement.
    • cut scores The raw score on a standards-based test that denotes a break between two ordinal levels of success (e.g., proficient vs. highly proficient).
    • DANTES Program A program that awards college credit to former soldiers on the basis of military training and experience.
    • decontextualized Assessment that does not involve tasks related to the perceived needs and/or interests of the individual. Most select-type questions are of this type, while authentic assessments are not.
    • deviation IQ scores A standard score used to express intelligence-test performance, usually with a mean of 100 and a standard deviation of 15.
    • diagnostic tests Tests with the goal of identifying the learning problems experienced by an individual. The differentiation among learning problems occurs by using measures exhibiting a high degree of negative skew.
    • differential item functioning This analytical method is used to test the appropriateness of individual test items for identified subgroups of the population taking the test (e.g., ethnic minorities).
    • disaggregated scores Descriptive statistics from mandated testing programs reported by population subgroups.
    • disaggregated subgroup mean The mean score for an identified subgroup of those people who were tested.
    • discrepancy The difference between a child's tested level of ability and his or her achievement. Used to assess for a learning disability.
    • discrimination level Statistic that describes how well individual items can identify test takers who do well on the test and those who perform badly. The range of possible discrimination levels is from a perfect +1.0 to the minimum of-1.0.
    • dissemination The distribution of results from an evaluation to all audiences and stakeholders. The parameters of the dissemination should be agreed upon prior to initiating the evaluation.
    • distracter analysis A step in item analysis involving an examination of the pattern of student selection of the various distracters from a multiple choice question.
    • distracters The wrong choice alternatives for multiple choice and other select-format test questions.
    • distribution An array of scores from the measurement of a variable aligned from low to high that can depict the frequency of occurrence for each of the possible scores.
    • divergent thinking Mental process of creating numerous productive and useful ideas and products starting from a single stimulus.
    • Dynamic Indicators of Basic Literacy Skills, 6th ed. (DIBELS) Test of mastery of early reading skills designed for use with children between kindergarten and third grade. Best employed as a classroom measure that monitors the progress of early readers rather than for identification of disabled readers.
    • EDGAR Acronym for the Education Department General Administrative Regulations.
    • educational standards Specified levels of accomplishment to be achieved by learners.
    • Educational Testing Service (ETS) The not-for-profit corporation that was chartered in 1948 to develop, score, and transcribe examinations used by educational programs, including the College Entrance Examination Board.
    • effectiveness score Score from a value-added assessment of a teacher that provides an expression of his or her efficaciousness.
    • Elementary and Secondary Education Act (ESEA) The Elementary and Secondary Education Act, passed in 1965 as (P.L. 89–10 [1965]). The law became the central education initiative of the administration of President Lyndon Johnson.
    • English-language learners (ELL) Children for whom English is not the native language and who are in the process of learning to achieve English at a cognitive academic level of proficiency.
    • entitlement decision Decision involving the provision of services or other assistance needed to “level the playing field” for children who have special needs.
    • equal protection Statement included in the 14th Amendment to the U. S. Constitution that has been used to argue for the inclusion of special education students in all aspects of public school programs.
    • error of prediction The difference between the predicted outcome and what was actually observed.
    • essay item A test question requiring the student to supply an answer of a paragraph or more.
    • essay item discrimination level Statistic that describes how well an extended-answer supply-type question (essay) differentiates between students who do well on the measure from those who do poorly. The range of values is from a perfect +1.0 to a minimum of −1.0.
    • ethnographic research Research that collects data by the direct, real-time observation of subjects in their natural setting. The researcher may or may not participate as a member of the group being observed.
    • eugenics movement Eugenicists believed the theoretical assumption that child growth and development was genetically driven and could be improved by preventing the mentally defective from being parents. The American Eugenics Society flourished during the first third of the 20th century.
    • evaluation Designed and facilitated to assess the strengths and weaknesses of programs, policies, personnel, products, and organizations.
    • evaluation standards Guidelines and appropriate procedures for educational evaluations first delineated by Daniel Stufflebeam in 1981.
    • executive function Cognitive process that controls and manages other mental processes and operations.
    • factor analysis Analytical procedure involved in estimating a common factor(s) that links the correlations among measured variables.
    • fair test A measure that is not affected by a priori differences between identifiable groups of test takers. Therefore, background and socioeconomic advantage have little impact on the scores from a fair test.
    • fairness Fairness implies that all children who are being tested have had an equal opportunity to have learned the material.
    • fidelity The accuracy of a test or measure for assessing a curriculum area, characteristic, or psychological trait. This is one of the primary components of test validity.
    • First International Mathematics Study (FIMS) The First International Mathematics Study was begun in 1964. One of the first of the international educational-outcomes-comparison studies.
    • First International Science Study (FISS) First International Mathematics Study compared the mathematics skills for students of 10 nations at the 4th-, 8th-, and 11th-grade levels.
    • fixed intelligence (g factor) model Based on the assumption that each individual's parameter for cognitive ability is set at conception and is composed of a central single factor of mental ability.
    • fluid intelligence An inherited type of intelligence that is needed to solve nonverbal problems that is independent of what is learned and isn't linked to experiences.
    • Flynn trend Assumption that cognitively demanding experiences serve as multipliers of the environmental effect on human ability. The increasing complexities of the Western world require different types of cognitive skills, a requirement that enhances the average cognitive ability with each generation. This effect is sometimes described as the generational effect.
    • formative evaluations Ongoing monitoring of the learners (e.g., integrating questions) during the teaching process. Measures or assessments that inform the ongoing instructional process.
    • free or reduced-cost lunch Subsidized and free lunches for children of families that are too poor to pay for them. In 2006–2007, a child of a single parent earning less than $24,421 per year would qualify.
    • Gantt chart Two-axis chart simultaneously depicting the timeline for a project and the tasks to be completed along the timeline. This chart can also identify the key personnel responsible for each of the tasks. Named for an American engineer, Henry Lawrence Gantt.
    • Gaussian normal curve Also known as the bell curve. The distribution of error around the average of a large set of independent observations.
    • general factor Factor analysis may identify a latent factor that accounts for almost all of the variance across the measures being analyzed. See single-factor models of intelligence.
    • gifted Educators consider gifted children to have special talents and high levels of academic aptitude (e.g., IQ ≤ 130).
    • grade equivalent score An ordinal scoring system that is purported to indicate the average grade level of students who have achieved at a particular level on an achievement test that is equal to what one particular test taker scored.
    • grade inflation A longitudinal trend for an upward shift in the average grades awarded.
    • grade point An ordinal number replacing a letter grade, usually from 4 = A to 0 = F. The points awarded can also be weighted to account for the level of difficulty of the course.
    • grade point average (GPA) The average obtained by dividing the total number of ordinal grade points earned by the total number of credits taken.
    • grade retention Grade retention occurs when a child is required to remain in the same grade for another school year while his or her peers move ahead to the next grade level.
    • grades The assignment of a value that is indicative of the quality and completeness of a student's work by the teacher.
    • grading by local norms Classroom teachers can convert scores from an achievement test into letter grades by assuming the distribution has the properties of a normal curve. This is known as “curving” the grades.
    • Gratz et al. v. Bollinger et al. Case for affirmative action involving admission decision involving the awarding of 20 bonus points to members of targeted minority groups. This case struck down this approach to affirmative admissions for minorities.
    • Great Society Part of a goal statement made by President Johnson during the State of the Union on January 4, 1965, which outlined a war on poverty.
    • Grutter v. Bollinger et al. Case involving law school admission policy that considered “soft variables” when accepting candidates for admission. This case made “holistic” admissions a possible approach to improving the diversity of membership in organizations.
    • guessing level The possibility of randomly guessing the correct answer for a select-type question.
    • halo effect The tendency for test graders to have their scoring influenced by prior knowledge of the students taking the test.
    • Harvard scholarships In 1933 the new president of Harvard University, James Bryant Conant, had his admissions dean, Henry Chauncey, develop a national scholarship program for Harvard. To do this, Chauncey selected the SAT, first developed at Princeton University, as the required measure for all students applying for the Harvard scholarships.
    • Head Start Program started by President Johnson in 1965, it was originally a summer program for children of poverty. It is now part of the Department of Health and Human Services, providing educational, nutritional, and developmental assistance to 925,000 preschool children from impoverished families a year.
    • Henderson mixed-model equation (HMME) Regression equation containing predictor variables, some of which have fixed levels while others have random variables. The equation contains both main effects (fixed and random) and interaction estimates of a random outcome effect.
    • hereditarian Belief that heredity plays a central role in all aspects of human nature and growth, including character traits, personality, and mental ability.
    • heteroscedastic Mathematical term for the condition that results when there is an inconsistent correlational relationship between two variables at the various levels of one of the variables. The result can be a pear-shaped scatterplot distribution between the two measures.
    • high-consensus items Test items designed to assess an area of knowledge where experts are in consensus agreement with the conceptual base being measured (e.g., language arts, basic arithmetic, and elementary reading).
    • high-stakes assessments Tests whose scores have significant consequences in the life of an individual.
    • Highly Objective Uniform State Standard of Evaluation (HOUSSE) Part of the No Child Left Behind Act designed to assure that all teachers are highly qualified.
    • highly qualified Description of teachers who have met exacting preparation standards as set by their state. The certification standards of the states must be first approved by the U.S. Department of Education.
    • histogram A representation of a frequency distribution having rectangular bars of different lengths. The height of each bar represents the score frequency on an ordinal variable.
    • history Alternate explanation for findings linked to the background environment and any ongoing unrelated activities that could cause changes.
    • holistic approach to admissions An admissions system approved at the outcome of Grutter v. Bollinger et al., whereby the entire application file of a prospective student must be read and a multidimensional decision made as to admissions.
    • holistic assessment A “big picture” approach to evaluation that provides one ordinal score for the whole performance.
    • holistic rubric Scoring rubric that provides one “big picture” score for a performance assessment. It is used for evaluation by testing companies and with state assessments involving essay-writing tasks.
    • holistic scoring Scores reported as a single, comprehensive ordinal that represents multiple dimensions of a performance task, such as an essay. Holistic scores are assigned based on performance standards presented on a scoring rubric.
    • honor roll List of top achieving students based on grade point average.
    • honor society Academic organization open to the best achieving students in secondary schools and colleges.
    • honors track Sequence of advanced courses reserved for top achieving students in middle schools and high schools.
    • Hopwood et al. v. The State of Texas The result of a federal lawsuit in 1996 that struck down an affirmative admissions system that had been used with the Law School of the University of Texas.
    • Improving America's Schools Act President Clinton's legislative centerpiece for education, signed into law in 1994 as PL. 103–382. This was the first step toward a federal requirement for standards-based testing programs.
    • Individual Family Service Plan (IFSP) Family-oriented plan that addresses the abilities and limitations of the child and provides a plan of action to remediate those areas of developmental delay.
    • Individualized Educational Plan (IEP) Plan required by law (IDEIA) for all disabled children that includes goals, services, accommodations, and description of how progress toward the goals will be measured. Required for every disabled child attending public school.
    • Individuals With Disabilities Educational Improvement Act (IDEIA) The 2004 reauthorization of the Education for All Handicapped Children Act (P.L. 94–142 [1974]), written to be more compatible with the mandates of the No Child Left Behind Act.
    • informal screening First step in the process of special education identification, usually carried out by teachers and counselors, involving observation, a review of records, collection of work samples, and parent conferencing.
    • Instructional Support Team (IST) A committee made up of the child's parents and all school personnel responsible for the child being referred for intervention. Formed to share information and address educational problems the child is having to map out strategies for the teacher and parent.
    • instrumentation Changes that will occur with the measuring technique and/or measuring instruments over time and by so doing pose an alternative explanation for the outcome.
    • intelligence test A test that provides a score estimating an individual's general mental ability by sampling performance on cognitive tasks.
    • internal consistency A reliability estimation based on one administration of a measure involving the intercor-relations of the individual items of the measure.
    • internal validity The linkage of the research question being asked and the research methods employed to answer the question. High internal validity implies that there are no viable alternative explanations for the outcome when the research methods are used.
    • International Adult Literacy Survey (IALS) Survey sponsored by the National Institute for Literacy of the U.S. Department of Education involving adults between the ages of 16 and 65 from 22 nations. First results were presented in 1998.
    • interpersonal An alternative explanation to the findings of an evaluation study that is caused by the personal needs and personality of participants (e.g., competition, need for approval, anxiety).
    • inter-rater reliability Statistical statement of the degree to which two or more raters (evaluators) agree when rating the same items.
    • interval data An ordered data set with each unit of measurement for the variable being equal and the distribution of scores scaled by reference to a standard score system.
    • IQ score Abbreviation for intelligence quotient. A score representing mental ability reported in a standardized form where the average is 100. Subscores of verbal IQ and mathematical IQ may be provided.
    • item analysis An analytical study of item quality before or after the test has been administered. Relates to test reliability as it makes it possible to build a bank of high-quality test items for use in future years.
    • item characteristic curve (ICC) This curve presents data from a cumulative distribution depicting the performance of test takers on a test item, showing the relationship between a latent characteristic (e.g., cognitive ability of the test takers) and success in answering the item.
    • item difficulty index The proportion of test takers who answered a test question correctly.
    • item discrimination index A value (max. = +1.00, min. = −1.00) describing the degree to which student success on an item matches the student level of success on the whole test. See discrimination level.
    • item response theory Both a theory and a statistical method used to link one or more latent traits of test takers (e.g., cognitive ability) to the probability that a test item will be answered correctly. It can be depicted using an item characteristic curve (ICC).
    • keyed answer The correct choice among the various alternative answers provided on a multiple choice question.
    • Kuder-Richardson-20 The statistical measure of reliability for tests scored using a binary (right-wrong) system. It is equivalent to a compilation of all possible permutations of split-half consistency (reliability) calculations.
    • Lake Woebegon effect Observation that most of the nation's school districts reported being “above average” on nationally normed achievement tests. Name taken from Garrison Keillor's mythical town of Lake Woebegon, where “all the children are above average.”
    • latent component Hidden factors or elements that are necessary to fully conceptualize the meaning of what is evident from an evaluation. These can be derived by logic, clinical analysis, and/or by empirical statistical methods (e.g., the NFL believes that IQ is a latent component of being a good pro football player and tests the IQ of all rookies).
    • latent semantic analysis (LSA) Software written using a sophisticated form of artificial intelligence to grade essays and even recommend an overall letter grade.
    • level of precision The exactness of data. There are four broad categories used to designate the precision of data: Nominal, Ordinal, Interval, and Ratio.
    • level of proficiency An ordinal score describing the degree of a learner's achievement of the educational standards.
    • Likert scales Opinion-measuring scales composed of one or several simple declarative sentences about the topic being measured and statements (usually 5 or 7 in number) expressing a degree of agreement with the statements.
    • Limited English Proficient (LEP) Description used to identify children who have insufficient skill with the English language to succeed in an English-only classroom. Synonym is English Language Learner (ELL).
    • linear responsibility chart Graphic display connecting the tasks to be performed with the personnel responsible for their completion.
    • longitudinal data Data that are collected from a group of subjects over a long timeframe. Multiple tests or observations are made on the same group of subjects and analyzed for trends.
    • magic pencil A variation on “curving” where a histogram is created of all the students in terms of either average score or total points. The teacher then subjectively decides where to draw lines through the distribution to define the various letter grades.
    • magnet schools Open-enrollment schools offering students from a wide geographic area various advanced education opportunities. Originally their goal was to voluntarily reduce de facto racial segregation in urban communities.
    • mandated assessments Tests required by state regulation or law.
    • matching questions Select-type question that requires test takers to match a short list of stimulus words or terms with the words or terms on a second list.
    • matrix sampling Technique used to reduce testing time, involving the development of a complete set of test items to cover a topic and then sampling from that total collection to create several smaller tests. These are then given to different students. No student-to-student comparisons can be made.
    • maturation Normal ontogenetic processes that can produce changes in subjects over time, and by so doing be misread as an effect or outcome of the program being evaluated.
    • mean A measure of central tendency referred to as an arithmetic average of a set of scores.
    • measurement Procedure used to determine and document student's current status on a variable.
    • median The score equal to the center of a rank ordered data set. The 50th percentile.
    • mental age A measure of the difficulty level of problems that a person can solve expressed as a comparison to a reference sample of subjects of a particular age.
    • merit pay Extra compensation above and beyond the common pay scale in the form of a bonus awarded to teachers viewed as being highly effective.
    • merit scholars A scoring category from the PSAT indicating very high scores across the three tests of the battery (math, verbal, and writing). Awards for top scoring students are made by the National Merit Scholarship Corporation each year to over 10,000 students.
    • meritocracy The organization of a system in which rewards are provided to those shown through competition to be deserving of merit. Meritocracies are designed to reward talent, competence, and effort, not connections and social standing.
    • millennial generation Cohort including the 75 million Americans who were born between 1977 and 1998.
    • minimum competency test Assessment model in wide use during the 1970s and 1980s designed to assure that high school graduates could demonstrate a defined minimum level of knowledge and/or skills.
    • miracle of Texas A term used to describe the apparent improvement in the graduation rate and achievement scores for the children of Texas following state educational reforms, which later became part of the No Child Left Behind Act. Investigative reporters have expressed doubt about the reality of the “miracle.”
    • mobility Extent to which children and their families move to different homes. The average American family moves once every five years.
    • mode The most frequently occurring score or outcome in a distribution of scores.
    • modified standards-based approach A method of grading by which the teacher starts with a grading scale that is fixed and divided into ordinal marks, which are presented as letters.
    • mortality Term used by Campbell and Stanley (1963) better expressed as mobility, to describe families moving into and out of communities.
    • multiple choice questions One of several select-type of questions that provides a stimulus question or statement and several alternative answers to select between.
    • multiple correlation The correlation between a variable and a weighted linear combination of predictor variables. This mathematical method is frequently referred to as multiple regression.
    • multiple-factor models of intelligence These mathematical models for human intelligence assume that there exists a finite number of intellectual abilities of factors that can be independently identified and measured.
    • multiple-intelligences model (MI) Howard Gardner's model that focuses on eight unique human intelligences or evolutionary products that make interaction with the environment possible.
    • National Assessment of Educational Progress (NAEP) The NAEP is the only nationally representative test given continually since 1969.
    • National Association for the Education of Young Children (NAEYC) The professional association of teachers, professors, and researchers involved with the welfare and education of young children.
    • national certification The National Board for Professional Teaching Standards was conceived by Albert Shanker of the AFT and funded by the Carnegie Corporation in 1986. It provides an independent source of teacher-quality evaluations.
    • Nation's Report Card The National Assessment of Educational Progress is designed to provide a national picture of public schools and to present data on the status of American education. State-by-state comparisons were first published in 1994.
    • naturalistic methodology Evaluation method employing ethnographic data collection and analysis in an effort to capture the culture of the program or school being scrutinized.
    • No Child Left Behind A revamping of the Elementary and Secondary Education Act in 2002 (P.L. 107–100). The goal of the law was to “close the achievement gap with accountability, flexibility, and choice, so that no child is left behind.”
    • nominal data Data from a variable that are expressed as a series of names or descriptors that are devoid of a mathematical measurement scale.
    • non-content factor Report card grade factors that go beyond the achievement of academic content and may include student motivation, effort level, cooperation, neatness, and self-control.
    • norm-based scoring system The assumption of a Gaussian normal curve underlies this grading technique, commonly referred to as “curving.”
    • norm referenced The interpretation of the meaning of a test score is based on how a sample of other people performs on the same measure. An individual's score is then expressed in terms of a relative standing as compared with others.
    • norm-referenced test Test that reports scores to test takers in terms of their relative performance compared to a reference sample of subjects who took the test previously.
    • normal curve equivalent (NCE) A standardized transformation of normally distributed raw score data having a distribution with a mean of 50 and a standard deviation of 21.06.
    • normative group A distribution of measurement scores from a group of subjects with known characteristics (age, sex, grade in school, etc.) to whom a person's performance may be compared.
    • norms Tables or other devices used to convert an individual's test score to a norm-referenced test score.
    • novelty The likelihood that changes in the environment per se, not the treatment effect, produce what appears to be an effect or outcome with the variable being measured.
    • objective test A test that is designed to minimize all subjective aspects to scoring. Usually a select format test such as a multiple choice test.
    • ogive A graph of a frequency distribution curve in which the frequencies are cumulative.
    • on-line report cards Internet-based system for reporting student academic progress interfaced with a comprehensive system for teacher- parent online communication.
    • out of level testing Testing in which all of the items are very difficult to differentiate among the best students. Used when identifying those with the highest level of academic ability.
    • parameter A constant linked to a statistical value that is determined by an arbitrarily selected position on a statistical probability curve.
    • pearson Correlation Coefficient A statistical expression of the degree to which two variables co-vary. The minimum correlation is 0 and the maximum is ±1.
    • percentile Part out of 100. A 100th part of any group or set of data or objects.
    • performance-based assessments Assessments that measure both the skills and knowledge acquired by students and require students to demonstrate critical-thinking and complex problem-solving skills.
    • performance level Student outcomes presented on a four- or five-point scale ranging from the performance level “below proficient” to the highest performance level, “advanced proficiency.”
    • performance test A measure involving the learner in creative tasks or problems requiring the application of learned skills and acquired knowledge.
    • personological factors Variables that are part of the psychological state of the individual and as such are considered attributes that resist experimental modification (e.g., mental ability, hand dominance, and reaction time).
    • pervasive developmental disability Psychiatric diagnostic classification for children involving stereotyped behavior, interests, and activities. Often paired with low IQ and poor communication skills.
    • phenomenological strategies Methodology in which data are derived from the eidetic recollections and mental reflection by subjects.
    • phonemic awareness Having knowledge of the link between small units of speech and the corresponding sounds represented by the letters of the alphabet.
    • phonics The connection between the sounds that result from the blending of phonemes (letters) together into word units. Also called the sound-spelling correspondence.
    • physical disability Loss of the ability to move and/or locomote owing to paralysis, tonic spasms, or severe pain.
    • placement decision A data-based decision for placing a student into the appropriate level for instruction by assessing the learner's level of previous achievement.
    • point-biserial correlation Correlation coefficient that can be used to express the relationship between how well students perform on a single test item (correct vs. wrong) and scores on the entire test. A method for determining the quality of an individual test item.
    • point system A variation on the modified-standards approach for grading in which everything (e.g., homework, tests, projects) that goes on in the classroom is assigned a point value (including possible “extra credit points”).
    • polytomous model The expansion of the Rasch model for studying the response pattern of subjects on test items in which there are several possible answers to each question, and the ability of the subject is used as a defining variable.
    • portfolios A compendium of data sources documenting the growth of a student's skills and knowledge over time in an area.
    • positivistic paradigm Evaluation data derived by use of objective instruments and structured questioning. All data collected are verifiable and assumed to be real.
    • practical intelligence One of three mental abilities that make up Sternberg's proposed triarchic model of mental ability.
    • prediction equation A correlational equation used to predict future outcomes based on the careful examination of previous relationships.
    • predictive validity The ability of a test or measure to estimate how well a person will perform a task. High predictive validity implies excellent predictions while low predictive validity indicates there is much error in the predictions.
    • prereading strategies Strategies that support reading comprehension and include asking pre-questions about the title of the passage to be read, thinking about the meaning of cues and prompts, doing an overview of the passage, and examining the passage's graphics.
    • Primary Mental Abilities (PMA) L. L. Thurstone's Primary Mental Abilities model of intelligence, which is composed of seven unique factors: word fluency, verbal comprehension, spatial visualization, number facility, associative memory, reasoning, and perceptual speed.
    • principle component A form of statistical analysis that builds a core structure or factor from a series of measures that subjects have taken. The first statistical model of mental ability proposed by Charles Spearman (g factor) used principle component analysis.
    • process portfolio A portfolio of student work that includes periodic statements by the student that are self-reflections on the learning process as perceived by the student.
    • Programme for International Student Assessment (PISA) Comparison program sponsored by the Organization for Economic Co-operation and Development that is administered to 10th grade students every 3 years.
    • progress reports Term for teacher-parent communication replaced in education jargon by the term report card.
    • prompt The target statement about which students must write essays on a writing test.
    • psychoeducational assessment A multifaceted assessment that involves clinical observations of the child, interviews with parents, and a full range of specialized tests designed to give a full picture of the child.
    • pull-out program The removal of special students from the regular class schedule to provide them with a specialized educational offering. Commonly done with children enrolled in a school's program for the gifted.
    • qualitative research Research method assuming multiple, dynamic realities that can only be understood in context.
    • quantitative research paradigm Philosophy of research assuming the existence of a stable reality that is scientifically measurable and in which findings are generalizable.
    • randomly selected Selection of the members or elements to be included in a sample in such a way that each element of the original population has an equal chance of inclusion.
    • range The distance (as measured in score points) between the highest and lowest scores on a distribution. Found by simple subtraction of the lowest score from the highest.
    • Rasch model One of several statistical models developed by Goerg Rasch (1901–1980) for the analysis of test item functioning yields outcomes similar to those from the item response theory (IRT). In many presentations the two terms are used together.
    • rater drift The tendency for the evaluation scores assigned by a rater to become higher after evaluating many performance assessments.
    • ratio data A mathematical method for scaling data that employs real numbers of equal unit size. It includes the value 0 and can also include negative numbers.
    • raw score The number of questions answered correctly on a test or assessment. This number may be corrected for guessing, but no other statistical transformation will have occurred with raw score data.
    • readiness Mentally and/or physically prepared for a learning experience. Usually refers to readiness to enter school.
    • recentered A new normative group was employed in 1996 to provide the scoring basis for the SAT I. This revision was the first since the test was standardized in 1941 and reflected the greater diversity of the population of students applying to college in the 1990s.
    • receptive language The comprehension of spoken language.
    • referral The process of initiating a request for the further evaluation of a child in school. Can be initiated by teachers, administrators, and educational specialists.
    • regression toward the mean Effect describing the statistical tendency for extremely low and extremely high scores to become closer to the group mean when the subjects are retested.
    • reliability A statement of the stability and/or consistency of the test scores from the administration of a measure.
    • residual scores With test data, residual scores are the product of a subtraction of one score from a second test score from the same subjects. Residual scores have very low reliability.
    • responses The reaction of the student to a question or structured stimuli presented by the teacher as part of an assessment. See answer.
    • restructuring Four years of low student scores on the NCLB-mandated tests triggers a series of changes including transferring members of the teaching staff and possible firing of school administrators, or even the closing of the school.
    • rubric An ordinal sequence of qualitative ranks with definitions and examples for each level that can be used to evaluate performance assessments.
    • sample of convenience Nonrandom sample of subjects chosen because the researcher had easy assess for collecting data from them.
    • sampling inadequacy Sampling inadequacy occurs when the items of a measure do not represent an appropriate collection of what is being measured. It can also used to refer to a violation of a statistical assumption needed to create subsets within a large set of data.
    • SAT II Newest edition of the test published by the College Entrance Examination Board and developed, scored, and transcribed by the Educational Testing Service.
    • scatterplots Two-dimensional (linear coordinate) presentation of the simultaneous outcome for two variables measured on one group of subjects.
    • scholastic aptitude Indicates how much a child can be expected to learn. This is usually measured using a test of cognitive ability.
    • school psychologist Educational professional who is certified by a state education department to provide for the mental health of school students.
    • school report card One mandate of the No Child Left Behind Act requires that each school create and publish a report card showing the average achievement levels, attendance, and graduation rate for students.
    • score gap Significant mean differences found between groups when scores are disaggregated.
    • scoring guide Used to improve the scoring reliability of an essay test. It consists of the points that must be covered and the relative value assigned for each of them on the answers to an essay question. See analytical rubric.
    • scoring rubric Written criteria, organized as an ordinal scale, used for evaluating the quality of extended-response (essay) questions and performance tasks.
    • section 504 Section of the Rehabilitation Act of 1973 requiring that public schools provide the support services needed to make public education available to all children with disabilities.
    • select-type questions Objective test questions that require the test taker to respond to a prompt (question or stem) by selecting which of several answers is correct (e.g., true-false format, multiple choice, and matching). selected-response format Testing formats that include true-false, matching, and various permutations of the multiple choice question.
    • selection Average score differences between groups that may occur when the subjects selected and assigned to each group are not equivalent prior to the study.
    • selection ratio The ratio between the number of applicants and the available admissions positions. This ratio can be used to set the difficulty level for a measure that will be employed to select those to be admitted.
    • sensitivity review An ethical step taken by test publishers prior to the publication of a new measure or evaluation. The step involves using a panel of experts representing various identifiable groups and subdivisions within the general population to evaluate individual test items for possible item fairness or bias problems.
    • showcase portfolio A portfolio of only the very best products produced by the student.
    • single-factor models of intelligence A mathematical model of human intelligence based on a central core factor of ability known as the “g,” or general, factor. Charles Spearman used mathematical analysis on the scores from many measures of mental ability to identify one unitary factor of ability.
    • skew Asymmetry in a distribution of scores. Distribution is said to be negatively skewed when most scores fall on the high end of the distribution and positively skewed when most scores fall on the low end of the distribution.
    • Spearman Correlation Coefficient A method for expressing the amount of concordance shared by two ordinal measures on a group of subjects.
    • special education Educational programs designed to meet the specific learning needs of children with disabilities. Programs are mandated under the Individuals With Disabilities Educational Improvement Act of 2004 (P.L. 108–446).
    • special-needs children Children who have been identified as having a specific disability in one or more of the following areas: cognitive, physical, and sensory.
    • stability The reliability that demonstrates that the scores obtained on a measure by a group of subjects will be correlated with a retest using the same measure and subjects at a later time.
    • stakeholders People within an organization who perceive themselves as possibly being affected by a study or evaluation of that organization.
    • standard deviation The square root of variance. The square root of the squared individual variations of scores from the mean score.
    • standard error of measurement A statistic used to estimate the probable range within which a person's true score on a test falls. The standard deviation of scores obtained around an individual's true score.
    • standard score A derived test score expressed in deviation units indicating how far the score is from the mean score of the group.
    • standardized scoring Scoring method involving turning every test and other measure into a standard deviation score (z-score). Each z-score is then multiplied by the appropriate weighting factor before it is added to develop a grade for each student.
    • standardized test Factor analysis may identify a latent factor that accounts for almost all of the variance across the measures being analyzed. See single factor models of intelligence.
    • standards-based assessments High-stakes tests with close links to both the curriculum and the approved standards for learning. These tests are scored showing the extent to which the student reached the standard for achievement.
    • standards-based grades An evaluation of student performance based on a measure of an approved standard for learning.
    • standards gap The different rigor required by the achievement standards set for the children of different schools.
    • Stanford-Binet Scales Highly reliable test constructed as a measure of the human intellect as defined by the Cattell-Horn-Carroll (CHC) model of mental ability.
    • stanine A score system that divides the area under the Gaussian normal curve into nine parts. Each of the seven center parts is one-half of a standard deviation in width. From standard nine (stanine).
    • STARS Nebraska's School-based, Teacher-led Assessment and Reporting System, which combines local assessments with commercially available instruments.
    • statistical bias This is evident when there is a systematic under- or overestimation of a select group of subjects on a test or other measurement.
    • stem The stimulus statement or question that frames the task on select-type questions, including multiple choice questions.
    • stereotype threat The extra stress felt by minority students facing high-states standardized tests. This stress is related to dominant stereotypical expectation held for those students.
    • structure of intellect model (SI) J. P. Guilford's model based on the identification of three universal dimensions for all mental abilities: operations, contents, and products. On each of these dimensions are a number of subcategories that, when taken together, make it possible to identify 180 unique mental abilities.
    • student response pads A new approach to formative assessment using an interactive computer system, “classroom clickers,” to give teachers real-time formative assessment data during instruction.
    • summary assessment (test) A test or examination taken by students following instruction used to provide a statement of how much was achieved by individual students after they were taught.
    • summative data analysis The statistical analysis and summary of evaluation outcome data. This analysis focuses on the outcomes of the project being evaluated.
    • supply-type questions Test questions requiring the test taker provide the answer to a question (e.g., completion questions, fill-in questions, and essay [extended-response format] questions).
    • table of specifications Two-dimensional blueprint for a test. Dimensions are the content and level of cognition the test items will require. This approach assures the measure will have content validity.
    • tacit knowledge The link between triarchic intelligence and real-world success as proposed by Robert Sternberg.
    • Teach for America program Liberal arts graduates from highly selective colleges and universities attend five weeks of summer classes in preparation for having their own classroom in the fall. They must contract to work a minimum of two years in the rural or inner-city school to which they are assigned.
    • teaching “for the test” Teaching the knowledge base and cognitive skills that make up the state's learning standards.
    • teaching “to the test” The direct instruction of students in preparation for a high-stakes test by teaching specific items from released test files.
    • test battery Test battery refers to a series of subtests that make up a larger measurement device. For example, the Armed Service Test Battery is composed of five separate subtests: Math Skills Test, Reading Skills Test, Mechanical Comprehension Test, Spatial Apperception Test, and Aviation and Nautical Information Test.
    • test bias This occurs when the test scores from an individual or identifiable group consistently under- or overestimate the capability or knowledge of that group or individual.
    • test-retest reliability The correlation between the scores from two sequential administrations of a test to one group of subjects, providing an estimation of test stability.
    • Third International Mathematics and Science Study (TIMSS) A parallel set of science and mathematics achievement surveys published beginning in 1995.
    • three-factor model of intelligence Robert Sternberg and E. L. Thorndike's model based on the identification of the three factors: practical intelligence, analytical intelligence, and creative intelligence.
    • triarchic model of intelligence Sternberg's conceptualization of mental ability as being composed of three elements: analytical, creative, and practical intelligences.
    • Troops to Teacher program (TTT) Fast-track program for providing teacher certification to returning military personnel. The TTT program was begun under the Clinton administration and expanded by President Bush under the NCLB Act in 2002.
    • true-false question Select-type item that provides a stimulus consisting of a statement and asks whether that statement is true or false.
    • true score An observed score plus or minus some amount of error.
    • University of California Regents v. Bakke The first challenge to affirmative action in admissions. This case struck down the use of racial set-asides in the admission process for a medical school.
    • upward drift The tendency for the anchor items of a test to appear to be less difficult on the subsequent annual test administrations. This results in ever-higher levels being set for the cut scores.
    • validity A statement of both the appropriateness of the test and its components and of the veracity of the test scores and their interpretations.
    • validity threat of testing When the subjects of an evaluation are tested sequentially, one after another, the test materials and/or test administrator may change over time and practice, thus influencing the outcome.
    • value added An assessment approach that focuses on the change that has occurred for each individual over a period of time. Seen as a method of evaluating growth and change within individuals.
    • value-added evaluation A data-driven longitudinal evaluation of schools, educators, and students by charting the growth of individuals in the educational context of the school.
    • variable A factor, conceptual entity, characteristic, or attribute that is likely to vary between individuals and/or within individuals over time.
    • variance The average of the sum of the squared deviations of individual scores from the mean.
    • war on poverty Sobriquet used by the administration of President Johnson to describe his domestic policy initiatives. One of these initiatives was the ESEA (1965).
    • Wechsler Intelligence Scale for Children (WISC) The most widely used intelligence tests in schools, having 11 subtests to measure ability along two major factors: verbal IQ and performance IQ.
    • Wechsler Preschool and Primary Scales of Intelligence, 3rd ed. (WPPSI) Highly reliable intelligence test battery used to assess the developing cognition of young children.
    • weighted combinations Combining marks and test grades by multiplying each by a weight that is representative of each evaluation's relative importance. This combination can then be used to produce a progress report grade.
    • weighted grades The number of grade points awarded for each course is differentiated according to the difficulty level of the course. Thus, Advanced Placement courses earn more grade points than do basic courses.
    • wrap-around assessments Assessments designed by local school systems in Nebraska to assess the standards not measured by the commercial achievement tests selected under the STARS system.


    Abedi, J., Hofstetter, C. H., & Lord, C. (2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74(1), 1–28.
    Accountability Division. (2004). Arizona's instrument to measure standards [Special Education Report]. Phoenix, AZ: Arizona Department of Education. Retrieved on January 19, 2005, from
    Achieve, Inc. (2005, February). Rising to the challenge: Are high school graduates prepared for college and work?Washington, DC: Peter D. Hart Research Associates. Retrieved February 7, 2005, from
    ACT (1997). ACT assessment: Technical manual. Iowa City: Author.
    ACT (2004). Are high school grades inflated? College Readiness: Issues in College Readiness. Retrieved April 13, 2005, from
    Adams, G. R., & Ryan, B. A. (2000, June). A longitudinal analysis of family relationships and children's school achievement in one and two parent families. Quebec: Human Resources Development.
    Ainsworth, L., & Viegut, D. (2006). Common formative assessments: How to connect standards-based instruction and assessment. Thousand Oaks, CA: Corwin Press.
    Alexander, K. L., Entwisle, D. R., & Olsen, L. S. (2007). Lasting consequences of the summer learning gap. American Sociological Review, 72(2), 167–180.
    Algozzine, B., Eaves, R. C., Mann, L., & Vance, H. R. (1988). Slosson Full-Range Intelligence Test: Normative/Technical Manual. Los Angeles: Western Psychological Corporation.
    Alvarado, M. (2006, December 10). Swimming upstream in the mainstream. Record [North Jersey]. Retrieved February 27, 2007, from
    American Association for Colleges of Teacher Education. (1999). Teacher education pipeline IV: Schools, colleges, and departments of education. Washington, DC: Author.
    American Educational Research Association (AERA) (2004a). Closing the gap: High achievement for students of color. Research Points: Essential Information for Educational Policy, 2(3). Author.
    American Educational Research Association (AERA) (2004b). English language learners: Boosting academic achievement. Research Points: Essential Information for Educational Policy, 2(1). Author.
    American Evaluation Association. (2005). About us. Retrieved August 22, 2005, from
    American National Standards Institute. (1995). The program evaluation standards. Washington, DC: Author.
    American Psychiatric Association (APA). (1994). Diagnostic and statistical manual of mental disorders (
    4th ed.
    ). Washington, DC: Author.
    Americans With Disabilities Act of 1990 (P.L.101–576, 1990), 42 U.S.C.A. § 12101 et seq.
    Amrein, A. L., & Berliner, D. C. (2002). The impact of high-stakes tests on student academic performance: An analysis on NAEP results in states with high-stakes tests and ACT, SAT and AP test results in states with high school graduation exams. Tempe, AZ: Educational Policy Studies Laboratory. Retrieved January 14, 2004, from
    Anderson, L. W, & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. New York: Longman.
    Anton, H., Kolman, B., & Averbach, B. (1988). Mathematics with applications for the management, life, and social sciences. New York: Harcourt Brace Jovanovich.
    Apgar, V (1953). A proposal of a new method of evaluation of the newborn infant. Current Researches in Anesthesia & Analgesia, 32, 261–267.
    Aratani, L. (2005, August 25). Group seeks to end gifted education. Washington Post. Retrieved August 27, 2005, from
    Archer, J. (2005, April 13). R.I. downplays tests as route to diploma. Education Week. Retrieved April 13, 2005, from
    Arenson, K. W (2004, April 18). Is it grade inflation, or are students just smarter?New York Times, p. WK2.
    Arenson, K. W. (2005, March 22). Faculty panel at Cal faults way to pick Merit scholars. New York Times, p. A14.
    Armbruster, B. B., Lehr, F., & Osborn, J. (2003). Put reading first: The research building blocks for teaching children to read. Washington, DC: National Reading Panel, Office of Educational Research and Improvement. Retrieved July 2, 2005, from
    Aronauer, R. (2005, September 30). Princeton's war on grade inflation drops the number of As. Chronicle of Higher Education, 52(6), p. 47.
    Ashcraft, M. H. (1995). Cognitive psychology and simple arithmetic: A review and summary of new directions. Mathematical Cognition, 7(1), 3–34.
    Aspey, S., & Colby, C. (2005, May 10). Spellings announces new special education guidelines, details workable, “common-sense” policy to help states implement No Child Left Behind [Press release]. Washington, DC: U.S. Department of Education. Retrieved November 28, 2006, from
    Associated Press (2005, July 25). Migrant workers, children savor summer school. Retrieved July 26, 2005, from
    Atherton, J. S. (2005). Learning and teaching: Bloom's taxonomy [Electronic version]. Retrieved November 26, 2006, from
    Ayers, K. (2005, December 14). Foreign classes leave their mark: Districts face hard issue of how to translate immigrants' transcripts. Dallas Morning News. Retrieved November 26, 2006, from
    Babu, S., & Mendro, R. (2003, April). Teacher accountability: HLM-based teacher effectiveness indices in the investigation of teacher effects on student achievement in a state assessment program. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
    Bailey, P. (2007). Students get prizes for taking science test. Miami Herald. Retrieved January 29, 2007, from
    Baker, B. D., & Cooper, B. S. (2005). Do principals with stronger academic backgrounds hire better teachers? Policy implications for improving high-poverty schools. Educational Administration Quarterly, 41(3), 449–479.
    Baker, D. P., Fabrega, R., Galindo, C., & Mishook, J. (2004). Instructional time and national achievement: Cross-national evidence. Prospects, 34(3), 311–334.
    Baldi, S., Perie, M., Skidmore, D., Greenberg, E., Hahn, C., & Nelson, D. (2001). What democracy means to ninth graders: U.S. results from the international IEA Civic Education Study. (Statistical Analysis Report, NCES 2001–096). Washington, DC: National Center for Educational Statistics.
    Baldwin, D., & Wylie, E. (2004). Comprehensive testing program,
    4th ed.
    New York: Educational Records Bureau.
    Ballantyne, R F. (2002). Psychology, society, and ability testing (1859–2002): Transformative alternatives to mental Darwinism and interactionism. Toronto, Canada: York University. Retrieved January 25, 2005, from
    Barkley, R. A. (1998). Attention-deficit hyperactivity disorder: A handbook for diagnosis and treatment (
    2nd ed.
    ). New York: Guilford.
    Barton, P. E. (2004). Why does the gap persist?Educational Leadership, 62(3), 9–13.
    Basken, P. (2006, March 29). States have more schools falling behind. Washington Post. Retrieved March 29, 2006, from
    Baum, H. I. (2005). Using student progress to evaluate teachers: A primmer on value-added models (Policy Information Perspective, Document 6463). Princeton, NJ: ETS.
    Bello, M. (2007, August 28). Later school starts gain popularity. USA Today. Retrieved September 1, 2007, from
    Belluck, P. (2006, February 5). And for perfect attendance, Johnny gets … a car. New York Times. Retrieved September 15, 2007, from
    Ben-Shakhar, G., & Sinai, Y (1991). Gender differences in multiple-choice tests: The role of differential guessing tendencies. Journal of Educational Measurement, 28(1), 23–35.
    Bennett, C. I., McWhorter, L. M., & Kukendall, J. A. (2006). Will I ever teach? Latino and African American students' perspectives on PRAXIS I. American Educational Research Journal, 43(3), 531–575.
    Bennett, G. K., Seashore, H. G., & Wesman, A. G. (1992). Differential Aptitude Tests,
    Fifth Edition
    . San Antonio, TX: Psychological Corporation, Harcourt Assessment.
    Bennett, S., & Kalish, N. (2006, June 19). No more teachers, lots of books. New York Times. Retrieved June 20, 2006, from
    Bennett, W L. (1992). The devaluing of America: The fight for our culture and our children. New York: Summit.
    Benton, J. (2006, September 2). Cheating: It's in the numbers. Dallas Morning News. Retrieved September 7, 2006, from
    Berger, J. (2007, February 7). More help for the struggling, less for the gifted. New York Times, p. A17.
    Bergstrom, B. (1998). [Review of the Revised PSB health occupations aptitude examination]. In J. C.Impara & B. S.Plake (Eds.), The thirteenth mental measurements yearbook (pp. 845–847). Lincoln, NE: Buros Institute on Mental Measurements.
    Berliner, D. C. (1997, February). Manufacturing a crisis in education [Keynote address]. Eastern Educational Research Association, Hilton Head Island, SC.
    Berliner, D. C. (2004, March/April). If the underlying premise for No Child Left Behind is false, how can that act solve our problems? (Occasional Research Paper # 6). The Iowa Academy of Education. Retrieved June 23, 2005, from
    Berliner, D. C. (2005). The near impossibility of testing for teacher quality. Journal of Teacher Education, 56(3), 205–213.
    Berliner, D. C., & Biddle, B.J. (1995). The manufactured crisis: Myths, fraud, and the attack on America's public schools. Cambridge, MA: Perseus Books.
    Berliner, D. C., & Nichols, S. L. (2007, March 14). High-stakes testing is putting the nation at risk. Education Week, 26(27), pp. 48, 36.
    Beserik, D. L. (2000). Teacher beliefs and practices as they relate to the implementation of the Pennsylvania System of School Assessment. Unpublished doctoral dissertation, University of Pittsburgh.
    Betebenner, D. W., & Sang, Y (2007, April). Reference growth charts for educational outcomes. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Bhatt, S. (2005, January 3). Schools struggle to reduce high teacher turnover. Seattle Times. Retrieved January 29, 2006, from
    Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with state's content standards: Methods and issues. Educational Measurement Issues and Practices, 22(3), 21–29.
    Birk, L. (2005, May/June). Grade inflation: What's really behind all those A's?Harvard Education Letter. Retrieved June 5, 2005, from
    Birnbaum, R. (Producer), & Robbins, B. (Director). (2004). Perfect score[Motion picture]. United States: Paramount Pictures.
    Black, S. (2006). The right size school [Electronic version]. American School Board Journal, 193(4). Retrieved March 23, 2007, from
    Blackwell, J. (1996). On brave old Army team: The cheating scandal that rocked the nation: West Point, 1951. New York: Reed Business Information.
    Bloom, B. S., et al. (Eds.) (1956). Taxonomy of educational objectives: The classification of educational goals: Handbook I, cognitive domain. New York: David McKay & Co.
    Bloor, E. (2004). Story time. Orlando, FL: Harcourt Children's Books.
    Bluebello, L. (2003). Prediction of student performance on the Pennsylvania System of School Assessment at the fifth grade. Unpublished doctoral dissertation, Widener University.
    Boe, E. E., & Shin, S. (2005). Is the United States really losing the international horse race in academic achievement?Phi Delta Kappan, 86(9), 688–695.
    Boeree, C. G. (1999–2000). Wilhelm Wundt and William James. Unpublished manuscript, Shippensburg University of Pennsylvania. Retrieved January 26, 2005, from
    Booher-Jennings, J. (2005). “Below the bubble”: Educational triage and the Texas accountability system. American Educational Research Journal, 42(2), 231–268.
    Borja, R. R. (2007, February, 21). Nebraska tangles with U.S. over testing. Education Week, 26(24), p. 34.
    Borman, G. (2007, May). Multiyear summer school. [Research summary]. Madison, WI: Wisconsin Center for Educational Research. Retrieved May 7, 2007, from
    Borman, G. D., Slavin, R. E., Cheung, A. C. K., Chamberlain, A. M., Madden, N. A., & Chambers, B. (2007). Final reading outcomes of the national randomized field trial of Success for All. American Educational Research Association, 44(3), 701–731.
    Borsuk, A. J. (2006, June 1). State getting off easy on No Child law, report says [Electronic version]. Milwaukee Journal Sentinel. Retrieved June 2, 2006, from
    Borzekowski, D. L. G., & Robinson, T. N. (2005). The remote, the mouse, and the No. 2 pencil. Archives of Pediatrics and Adolescent Medicine, 159(7), 607–613.
    Boudett, K. P., Murnane, R. J., City, E., & Moody, L. (2005). Teaching educators how to use student assessment data to improve instruction. Phi Delta Kappan, 86(9), 700–706.
    Bowler, M. (2004, July 22). Special-education students struggle to pass state exams. Baltimore Sun [Electronic version]. Retrieved July 29, 2004, from,0,7929776.story?c
    Boyd, B. (2007, April). NCLB: A business perspective. Paper presented during the annual meeting of the National Council on Measurement in Education, Chicago, IL.
    Bradley, R. H. (1985). [Test review of the Gesell school readiness test]. From J. VMitchell Jr. (Ed.), The ninth mental measurements yearbook [Electronic version]. Retrieved January 29, 2005, from the Buros Institute's Test Reviews Online Web site:
    Braun, H., & Wainer, H. (2007). Value-added modeling. Handbook of statistics, 26 (pp. 867–892). Amsterdam, Netherlands: Elsevier.
    Braun, H. I. (2005) Using student progress to evaluate teachers: A primer on value-added models [Policy information perspective]. Princeton, NJ: ETS. Retrieved August 24, 2007, from
    Bridgeman, B. (1980). Generality of a “fast” or “slow” test-taking style across a variety of cognitive tasks. Journal of Educational Measurement, 17(3), 211–217.
    Bridgeman, B., & Wendler, C. (2004). Characteristics of minority students who excel on the SAT and in the classroom (Policy Information Report). Princeton, NJ: Educational Testing Service.
    Brock, K. C. (2006, March 16). Are they ready to go? Some experts believe starting school late can give kids an edge. Southern Illinoisan. Retrieved March 24, 2006, from
    Brookhart, S. M. (1991). Grading practices and validity. Educational Measurement: Issues and Practice, 10(1), 35–36.
    Brookhart, S. M. (1994). Teacher's grading: Practice and theory. Applied Measurement, 7, 279–301.
    Brookhart, S. M. (2004). Grading. New York: Pearson Education.
    Brookhart, S. M. (2005). The quality of local district assessments used in Nebraska's School-based, Teacher-led Assessment and Reporting System (STARS). Educational Measurement: Issues and Practice, 24(2), 14–21.
    Brown v. Board of Education, 347 U.S. 483 (1954).
    Brown, D., & Resseger, J. (2005, November 28). Ten moral concerns in the implementation of the No Child Left Behind Act [A statement of the National Council of Churches Committee on Public Education and Literacy]. New York: National Council of Churches USA. Retrieved March 30, 2006, from
    Brown, S. R., Claudet, J. G., & Olivarez, A. (2002). Investigating organizational dimensions of middle school curricular leadership: Linkages to school effectiveness. Research in Middle Level Education [Electronic version]. Retrieved July 22, 2005, from
    Brown, T. E. (2001). Brown attention-deficit disorder scales for children and adolescents. San Antonio, TX: Harcourt Assessment Division, Psychological Corporation.
    Brown, W. (1910). Some experiment results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.
    Brunsman, B. A. (2005). [Test review of the DIBELS: Dynamic indicators of early literacy skills, sixth edition]. From R. A.Spies & B. S.Plake (Eds.), The sixteenth mental measurements yearbook [Electronic version]. Retrieved July 2, 2005, from the Buros Institute's Test Reviews Online Web site:
    Buchanan, J. (2002). Ignoble motivation: Arguments of opponents to the grading system. Unpublished manuscript, SUNY New Paltz. Retrieved June 26, 2005, from
    Buhs, E. S., Ladd, G. W., & Herald, S. L. (2006). Peer exclusion and victimization: Processes that mediate the relation between peer group rejection and children's classroom engagement and achievement. Journal of Educational Psychology, 98(1), 1–13.
    Bukowiecki, E. M. (2007). Teaching children how to read. Kappa Delta Pi Record, 43(2), 58–65.
    Bunium-Murray Productions (Producer). (2005). The Scholar [Television series]. New York: NBC-Universal.
    Buoye, A. J. (2004). Capitalizing on the extra curriculum: Participation, peer influence, and academic achievement. Unpublished doctoral dissertation, Notre Dame University.
    Burke, K. (2005). How to assess authentic learning. Thousand Oaks, CA: Corwin Press.
    Burney, M. (2006, December 20). New Jersey flags 40 schools for test scores. Philadelphia Inquirer. Retrieved December 20, 2006, from
    Burns, M. K., MacQuarrie, L. L., & Campbell, D. T. (1998). The difference between curriculum-based assessment and curriculum-based measurement: A focus on purpose and result [Electronic version]. NASP Communiqué, 27(6). Retrieved June 28, 2004, from
    Burson, K. C., & Wright, R. J. (2003, February). What do state assessments really measure? Paper presented at the annual meeting of the Eastern Educational Research Association, Hilton Head, SC.
    Burt, C. (1955). The evidence for the concept of intelligence. British Journal of Educational Psychology, 25, 159–177.
    Burt, C. (1958). The inheritance of ability. American Psychologist, 13(3), 1–15.
    Bush, G. W. (2003, September 9). President Bush discusses the “No Child Left Behind Act” in Florida [Press release]. Retrieved June 5, 2005, from
    Camilli, G., & Monfils, L. F. (2004). Test scores and equity. In W A.Firestone, R. YSchorr, & L. F.Monfils (Eds.), The ambiguity of teaching to the test: Standards, assessment, and educational reform (pp. 143–157). Mahwah, NJ: Lawrence Erlbaum Associates.
    Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research on teaching. In N. L.Gage (Ed.), Handbook of research on teaching (pp. 171–246). Chicago: Rand McNally.
    Canivez, G. L., & Konold, T. R. (2001). Assessing differential prediction bias in the developing cognitive abilities test across gender, race/ethnicity, and socioeconomic groups. Educational and Psychological Measurement, 61(1), 159–171.
    Cantwell, J. G. (2007, February). The principal's role in statewide testing: A national, state, and local school district perspective. Paper presented at the annual meeting of the Eastern Educational Research Association, Clearwater, FL.
    Carbonaro, W J., & Gamoran, A. (2002). The production of achievement inequality in high school English. American Educational Research Journal, 39(4), 801–827.
    Carbonaro, W J., & Gamoran, A. (2005). The effect of high-quality instruction on reading outcomes [Electronic version]. Research Brief, Association for Supervision and Curriculum Development, 3(4). Retrieved July 20, 2005, from
    Carlson, J. E. (1998). [Review of the Accounting-aptitude test], In J. C.Impara & B. S.Plake (Eds.), Thirteenth mental measurements yearbook (pp. 14–17). Lincoln, NE: Buros Institute of Mental Measurements.
    Carlson, J. E. (2007, April). Vertical scaling issues. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Carnegie Forum on Education and the Economy's Task Force on Teaching as a Profession. A nation prepared: Teachers for the 21st century. New York: Carnegie Corporation of New York, 1986. (ERIC Document Number ED268120)
    Carpenter, D. C., & Malcolm, K. K. (2001). [Review of the Oral and written language scales written expression]. In B. S.Plake & J. C.Impara (Eds.), The fourteenth mental measurements yearbook (pp. 864–868). Lincoln, NE: Buros Institute of Mental Measurements.
    Carroll, J. B. (1963). A model for school learning. Teacher's College Record, 64, 723–733.
    Cary, K. (2004). The funding gap 2004: Many states still shortchange low-income and minority students. Washington, DC: Education Trust. Retrieved October 16, 2006, from
    Case, B. J. (2004, January). It's about time: Stanford Achievement Test Series, Tenth Edition [Assessment Report]. San Antonio, TX: Harcourt Assessment. (Originally published April 2003)
    Cassady, J. C. (2001). The stability of undergraduate student's cognitive test anxiety levels [Electronic version]. Practical Assessment, Research & Evaluation, 7(20). Retrieved October 24, 2006, from
    Casserly, M. (2004, March). Beating the odds IV A city-by-city analysis of student performance and achievement gap on state assessments, results from 2002–2003 school year. Council of the Great City Schools. Retrieved May 30, 2005, from
    Castelli, R. A. (1994). Critical thinking instruction in liberal arts curricula. Unpublished doctoral dissertation, Widener University.
    Catania, C. A. (1999). Thorndike's legacy: Learning, selection, and the law of effect. Journal of the Experimental Analysis of Behavior, 72(3), 425–428.
    Cattell, R. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational Psychology, 1(54), 1–22.
    Cavanagh, A., & Robelen, E. W (2004) Bush backs requiring NAEP in 12th grade. Education Week23(31), pp. 32, 34.
    Cavanaugh, S. (2005, June 8). NCTM elaborates on position on the use of calculators in classrooms. Education Week, 24(9), p. 9.
    Cavanaugh, S. (2006, November 15). Technology helps teachers home in on student needs. Education Week, 26(12), pp. 10–11.
    Cavanaugh, S. (2007, February 21). “Math anxiety” confuses the equation for students. Education Week, 26(24), p. 12.
    Cawthon, S. W (2007). Hidden benefits and unintended consequences of the No Child Left Behind policies for students who are deaf or hard of hearing. American Educational Research Journal, 44(3), 460–492.
    Cech, S. J. (2007, August 15). 10-state pilot preparing teachers to develop tests. Education Week, 26(45), p. 10.
    CeperleyP. E., & Reel, K. (1997). The impetus for the Tennessee value-added accountability system. In J.Millman (Ed.), Grading teachers, grading schools (pp. 133–136). Thousand Oaks, CA: Corwin Press.
    Chan, J. C. K., McDermott, K. B., & Roediger III, H. L. (2006). Retrieval-induced facilitation: Initially non-tested material can benefit from prior testing of related material. Journal of Psychology/General, 135(4), 553–571.
    Chang, K. D. (2005, June 20). Attention-deficit/hyperactivity disorder. E-Medicine. Retrieved June 30, 2005, from
    Cheng, Y., & Chang, H. (2007, April). Two item selection routes for cognitive diagnostic CAT. Paper presented during the annual meeting of the National Council on Measurement in Education, Chicago, IL.
    Chiles, N. (1997, September 7). Wealth helps: A wealth of expectations. [Electronic version]. Newark Star Ledger. Retrieved August 21, 2004, from
    Christensen, D. D. (2001, December). Building state assessment from the classroom up. School Administrator [Electronic version]. Retrieved August 4, 2005, from
    Chubb, J. E., & Moe, T. M. (1992). Educational choice: Why it is needed and how it will work. In C. E.Finn, Jr. & T.Rebarer (Eds.), Education reform in the 90s (pp. 36–52). New York: Macmillan.
    Civil Rights Act (P.L. 88–352, § Sec. 402, 1964).
    Cizek, G. J. (2003). [Test review of the Woodcock-Johnson (r) III]. From B. S.Plake & J. C.Impara (Eds.), The fifteenth mental measurements yearbook [Electronic version]. Retrieved March 1, 2005, from the Buros Institute's Test Reviews Online Web site:
    Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31–50.
    Cizek, G. J., Johnson, R. L., & Mazzie, D. (2005). [Review of the Terra nova test] In The sixteenth mental measurement yearbook [Electronic version]. Retrieved November 8, 2006, from the Buros Institute's Test Reviews Online Web site:
    Clabaugh, G. K., & Rozycki, E. G. (1990). Understanding schools: The foundations of education. New York: Harper & Row.
    Clark, L. (2005). Gifted and growing. Educational Leadership, 63(3), 56–60.
    Clarridge, P. B., & Whitaker, E. M. (1997). Rolling the elephant over: How to effect large-scale change in the reporting process. Portsmouth, NH: Heinemann.
    Clements, A. (2004). The report card. New York: Books for Young Readers, Simon & Schuster.
    Cloud/Thornburg, J. (2004, September 20). Saving the smart kids. Time. Retrieved September 23, 2004, from,8816,1101040927-699423,00.html
    Clymer, J. B., & Wiliam, D. (2006/2007). Improving the way we grade science. Educational Leadership, 64(4), 36–42.
    Coelen, S., & Berger, J. (2006). First steps: An evaluation of the success of Connecticut students beyond high school [Fact sheet]. Nellie Mae Education Foundation, Quincy, MA. Retrieved March 29, 2006, from
    Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
    Cohen, J. (1988). Statistical power analysis for the behavioral sciences (
    2nd ed.
    ). Hillsdale, NJ: Lawrence Erlbaum Associates.
    Cohen, K. (2006, July 7). Why colleges should thank private admissions counselors. Chronicle of Higher Education, 52(44), p. B20.
    Cohen, M. (2007, April). Aligned expectations? A closer look at college admissions and placement tests [Project Report]. Washington, DC: Achieve.
    Cohen-Vogel, L., & Smith, T. M. (2007). Qualifications and assignments of alternatively certified teachers: Testing the core assumptions. American Educational Research Journal, 44(3), 732–753.
    Cohn, D., & Bahrampour, T. (2006, May 10). Of U.S. children under 5, nearly half are minorities. Washington Post. Retrieved May 11, 2006, from
    Colangelo, N., & Davis, G. A. (1997). Handbook of gifted education (
    2nd ed.
    ). Needham Heights, MA: Allyn and Bacon.
    Colby, S. A., & Smith, T. W (2007, April). Quality teaching and student learning: A validation study of national Board Certified teachers. A paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Cole, N. S., & Moss, P. A. (1989). Bias in test use. In R. L.Linn (Ed.), Educational measurement (
    3rd ed.
    ) (pp. 201–219). New York: Macmillan Publishing Co.
    Coleman, J. S. (1972). The evaluation of equality of educational opportunity. In F.Mosteller & D. P.Moynihan (Eds.), On equality of educational opportunity (pp. 146–167). New York: Random House.
    College Board. (2005). Advanced Placement report to the nation, 2005. New York: Author. Retrieved March 19, 2006, from
    Collins, A., & Dana, T. M. (1993). Using portfolios with middle grade students. Middle School Journal, 25(2), 14–19.
    Colom, R., Lluis-Font, J. M., & Andrés-Pueyo, A. (2005). The generational intelligence gains are caused by decreasing variance in the lower half of the distribution: Supporting evidence for the nutritional hypothesis. Intelligence, 33(1), 83–91.
    Committee on Quality Improvement, Subcommittee on Attention-Deficit/ Hyperactivity Disorder (2000). Clinical practice guidelines: Diagnosis and evaluation of the child with attention-deficit/hyperactivity disorder. Pediatrics, 105(5), 1158–1170.
    Committee on Quality Improvement and Subcommittee on Attention-Deficit/ Hyperactivity Disorder. (2001). Clinical practice guideline: Treatment of the school-aged child with attention-deficit/hyperactivity disorder. Pediatrics, 108(4), 1033–1044.
    Communications Technology Amendment. (P.L. 105–220 [Title 29, U.S.C.], 794d § Sec. 508, 1998).
    Conners, K. (1997/2000). Conners' rating scales-revised. Toronto, Canada: Multi-Health Systems.
    Connolly, A. J. (1998). Key math-revised. Circle Pines, MN: American Guidance Service Inc.
    Cooper, H. (2001). The battle over homework: An administrator's guide to setting sound and effective policies (
    2nd ed.
    ). Thousand Oaks, CA: Corwin Press
    Cooper, H. (2003, May). Summer learning loss: The problem and some solutions. Champaign, IL: University of Illinois, ERIC Clearinghouse on Elementary and Early Childhood Education. Retrieved July 18, 2003, from
    Cortese, A., & von Zastrow, C. (2006, January 18). Closing the staffing gap. Education Week, 25(19), p. 34.
    Creamer, B. (2007, March 1). Teacher's workday averages 15–5 hours. Honolulu Advertiser. Retrieved March 17, 2007, from
    Creighton, T. B. (2007). Schools and data: The educators guide for using data to improve decision making (
    2nd ed.
    ). Thousand Oaks, CA: Corwin Press.
    Crocker, L. (2003). Teaching for the test: Validity, fairness, and moral action [2003 NCME Presidential Address]. Educational Measurement: Issues and Practice, 22(3), 5–11.
    Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
    Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
    Cronin, J., Kingsbury, G. G., McCall, M., & Bowe, B. (2005, April). The impact of the No Child Left Behind Act on student achievement and growth, 2005 Edition. Northwest Evaluation Association. Retrieved June 23, 2005, from
    Crouse, J. (2006, March 27). Students rewarded for sacrificing Saturdays for FCAT training. Lakeland Ledger. Retrieved April 2, 2006, from
    Currie, R. A. (1994). Predicting public school student achievement in a five county region in Ohio: The development of a prediction equation. Doctoral dissertation, The University of Akron. (ProQuest document ID 747020331)
    Damarin, F., (1985). [Test review of Creativity Assessment Packet]. From J. VMitchell Jr. (Ed.), The ninth measurements yearbook [Electronic version]. Retrieved February 14, 2005, from the Buros Institute's Test Reviews Online Web site:
    Darling-Hammond, L. (2000). Teacher quality and student achievement: A review of state policy evidence [Electronic version]. Education Policy Analysis Archives, 8(1). Retrieved July 17, 2005, from
    Darling-Hammond, L. (2004). From “separate but equal” to “No Child Left Behind”: The collision of new standards and old inequalities. In D.Meier and G.Wood (Eds.), Many children left behind: How the No Child Left Behind Act is damaging our children and our schools (pp. 3–32). Boston: Beacon Press.
    Darling-Hammond, L. (2005, April). Teacher characteristics and student achievement in the Houston Independent School District. Paper presented at the annual meeting of the American Educational Research Association. Montreal, Canada.
    Darling-Hammond, L., Holtzman, D. J., Gatlin, S. J., & Heilig, J. V (2005, April). Does teacher preparation matter? Evidence about teacher certification, Teach for America, and teacher effectiveness (Working Paper). Palo Alto, CA: School of Education, Stanford University.
    Darwin, C. R. (1859). Origin of species by means of natural selection. London: J. Murray. Reprinted (1976) New York: Random House.
    Darwin, C. R. (1877). A biographical sketch of an infant. Mind: A Quarterly Review of Psychology and Philosophy, 2(7), 285–294.
    Datar, A. (2003). The impact of changes in kindergarten entrance age policies on children's academic achievement and the child care needs of families (Report RGSD-177, 2003). Santa Monica, CA: Pardee Rand Graduate School, Rand Corporation.
    Davis, G. (2004, July 2). Today, even B students getting squeezed out. Chronicle of Higher Education: Chronicle Review. Retrieved April 21, 2005, from
    Dawson, D. A. (1991). Family structure and children's health and well-being: Data from the National Health Interview Survey on Child Health. Journal of Marriage and the Family, 53, 573–584.
    de Vise, D. (2007, March 4). A concentrated approach to exams. Washington Post. Retrieved March 8, 2007, from
    Debra P. v. Turlington, 644 F. 2d 397, 402–403 (5th Cir. 1981).
    Decker, P. T., Mayer, D. P., & Glazerman, S. (2004, June 9). The effects of Teach for America on students: Findings from a national evaluation [Technical report]. Princeton, NJ: Mathematica Policy Research Co. Retrieved July 8, 2005, from
    Dee, T. S. (2005). A teacher like me: Does race, ethnicity, or gender matter?American Economic Review, 95 (2), 158–165.
    deFur, S. H. (2003). [Test review of the Test of early reading ability, third edition]. From B. S.Plake & J. C.Impara (Eds.), The fifteenth mental measurements yearbook [Electronic version]. Retrieved July 2, 2005, from the Buros Institute's Test Reviews Online Web site:
    DeGregory, L. (2005, February 10). For sick kids, FCAT's just one more exam. St. Petersburg Times. Retrieved March 20, 2006, from
    DeLacy, M. (2004, June 23). The “no child” law's biggest victims?Education Week, p. 40.
    DeMars, C. E. (2000). Test stakes and item format interactions. Applied Measurement in Education, 13(1), 55–77.
    Dessoff, A. (2007). Certifying AP courses. District Administration, 43(4), 54–59.
    DeStefano, L. (2001). Test review of the Otis-Lennon school ability test, seventh edition. From B. S.Plake & J. C.Impara (Eds.), The fourteenth mental measurements yearbook [Electronic version]. Retrieved March 3, 2005, from the Buros Institute's Test Reviews Online Web site
    Dickens, W T., & Flynn, J. (2006, October). Black Americans reduce the racial IQ gap: Evidence from standardized samples [Electronic version]. Psychological Science. Retrieved December 7, 2006, from
    Dillon, N. (2007). Crossing the line: School districts are getting tough with parents who hop boundaries to enroll students. American School Board Journal. Retrieved January 14, 2007, from
    Dillon, S. (2006, March 26). Schools cut back subjects to push reading and math. New York Times, pp. 1, 22.
    Dillon, S. (2007, February 7). Advanced Placement Tests are leaving some behind. New York Times, p. A17.
    Dillon, S. (2007, June 18). Long reviled, merit pay gains among teachers. New York Times, 156(53979), pp. A1, 14.
    DiMartino, J. (2007, April 25). Accountability or mastery? The assessment trade-off that could change the landscape of reform. Education Week, 26(34), pp. 44, 36.
    DiMartino, J., & Castaneda, A. (2007, April). Assessing applied skills. Educational Leadership, 64(7), 38–42.
    Dizon, N. Z., Feller, B., & Bass, F. (2006, April 18). States omitting minorities' test scores. [Associated Press]. Retrieved April 21, 2006, from
    Dodd, A., & Morris, M. (2005, May 6). Gwinnett teacher who refuses to alter grade is fired. Atlanta Journal-Constitution. Retrieved November 5, 2006, from
    Doherty, K. M. (2004, August 11). Assessment [Education Issues A-Z]. Education Week. Retrieved August 22, 2004, from
    Doll, B. J. (2003). [Test review of the Wechsler individual achievement test, second edition]. From B. S.Plake & J. C.Impara (Eds.), The fifteenth mental measurement yearbook [Electronic version]. Retrieved July 6, 2005, from the Buros Institute's Test Reviews Online Web site:
    Donnelly, A. M. (1999). Self-questioning: A comparative analysis of what teachers and students report about the use of this reading comprehension strategy. Unpublished doctoral dissertation, Widener University.
    DonskyR (2005, May 15). When teachers cheat. Atlanta Journal Constitution. Retrieved May 17, 2005, from
    Dorans, N. J. (2002). The recentering of SAT scales and its effects on score distributions and score interpretations (College Board Research Report No. 2002–11, ETS RR-02–04). New York: College Examination Board.
    Dorans, N. J., & Zeller, K. (2004). Using score equity assessment to evaluate the equatability of the hardest half of a test to the total test. Retrieved March 14, 2005, from Educational Testing Service Web site:
    Dorr-Bremme, D. W, & Herman, J. L. (1986). Assessing student achievement: A profile of classroom practices (CSE Monograph in Evaluation No. 11). Los Angeles: University of California, Center for Evaluation. (ERIC Document Reproduction Service No. ED 338 691)
    Dougherty, C., Mellor, L., & Shuling, J. (2006). The relationship between advanced placement and college graduation. Austin, TX: The National Center for Educational Accountability, University of Texas.
    Duckworth, A. L., & Seligman, M. E. P. (2006). Self-discipline gives girls the edge: Gender in self-discipline, grades, and achievement test scores. Journal of Educational Psychology, 98(1), 198–208.
    Duff, W (1767, 1964). An essay on original genius and its various modes of exertion in philosophy and the fine arts, particularly in poetry. Gainesville, FL: Scholar's Facsimiles and Reprints.
    Duncan, D. (2005). Clickers in the classroom: How to enhance science teaching using classroom response systems. San Francisco: Addison Wesley/Pearson.
    Easley, M. (2003, March). Governor Mike Easley's teacher working conditions initiative: Preliminary report of findings from a statewide survey of educators. Retrieved July 11, 2005, from
    Easterbrook, G. (2004, October). College admissions 2004, Who needs Harvard?Atlantic. Retrieved March, 23, 2006, from
    Edmonston, B., Lee, S. M., & Passel, J. S. (2001). Intermariage, immigration, et statistques raciales aux Etats-Unis. Critique Internationale, 12, 30–38.
    Educate America Act (P.L. 103–227, 1994).
    Education for All Handicapped Children's Act (P.L. 94–142, 1975 [S. 6]).
    Education of the Handicapped Act Amendments (P.L. 99–457, 1986), 20 U.S.C.A. § 1400 et seq.
    Education Trust-West. (2005). California's hidden teacher spending gap: How state and district budgeting practices shortchange poor and minority students and their schools (A Special Report). Oakland, CA: Education Trust-West, the James Irving Foundation, and the Bill and Melinda Gates Foundation. Retrieved October 16, 2006, from
    Eklöf, H. (2007, April). Gender differences in test-taking motivation on low-stakes tests: A Swedish TIMSS 2003 example. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Elementary and Secondary Education Act [ESEA] of 1965 (PL. No. 89–10, § Sec. 201, 1965).
    Elementary and Secondary Education Act [ESEA] of 1994, Improving America's Schools Act (P.L. No. 103–382, 1994).
    Elementary and Secondary Education Act [ESEA], Title I Program Directive (1998). retrieved June 20, 2005, from
    Elias, M. J., & Schwab, Y (2004, October 20). What about parental involvement in parenting?Education Week, 24(8), p. 39.
    Elliott, E. J. (1995, March 15, 16). Professional benchmarks on PRAXIS tests: An application to NCATE accreditation [Workshop Material]. Washington, DC: National Council for the Accreditation of Teacher Education.
    Elvin, C. (2003). Test item analysis using Microsoft Excel spreadsheet program. Unpublished manuscript, Tokyo Women's Medical University. Retrieved March 16, 2007, from
    Engle, S. (2002, January 28). College freshmen more politically liberal than in the past, UCLA survey reveals [Press release]. Los Angeles: Higher Education Research Institute. Retrieved June 25, 2005, from
    Engle, T. L. (1945). Psychology: Principles and applications. Yonkers-on-Hudson, NY: World Book.
    English, F. W (2000). Deciding what to teach and test: Developing, aligning, and auditing the curriculum, Millennium Edition. Thousand Oaks, CA: Corwin Press.
    Erikson, E. H. (1968). Identity youth and crisis. New York: W W Norton.
    Eshel, N. (2004, April 29). Effects of grade proposal debated. Daily Princetonian. Retrieved July 16, 2004, from
    Espenshade, T J., & Chung, C. Y (2005). The opportunity cost of admission preferences at elite universities. Social Science Quarterly, 86(2), 293–305.
    Eurydice. (2004). Integrating immigrant children into schools in Europe. Brussels, Belgium: Author. Retrieved November 18, 2006, from
    Evans, L. D. (1990). A conceptual overview of the regression discrepancy model for evaluating severe discrepancy between IQ and achievement scores. Journal of Learning Disabilities, 23, 406–412.
    Evers, W M., & Walberg, H. J. (Eds.). (2004). Testing student learning: Evaluating teaching effectiveness. Palto Alto, CA: Hoover Institution Press.
    Ewing, M. (2006). The AP program and student outcomes: A summary of research. Research Notes, November. New York: College Board.
    Fager, J. (Producer). (2004, January 7). 60 Minutes [Television broadcast]. New York: CBS.
    Fair Test. (2007). No Child Left Behind [Web page]. Cambridge, MA: Fair Test. Retrieved March 19, 2007, from
    Fallone, G., Acebo, C., Seifer, R., & Carskadon, M. A. (2005). Experimental restriction of sleep opportunity in children: Effects on teacher ratings. Sleep, 28(12), 1279–1285.
    Family Educational Right to Privacy Act (Buckley Amendment) [20 USC § 1232g; 34 CFR Part 99 (1974)].
    Feistritzer, E. C. (1999, May 13). Teacher quality and alternative certification programs. Testimony of Emily C. Feistritzer before the U.S. House of Representatives Committee on Education and the Workforce. Retrieved February 9, 2005, from
    Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L.Linn (Ed.), Educational Measurement (
    3rd ed.
    ), (pp. 105–146). New York: American Council on Education and Macmillan Publishing.
    Feller, B. (2006, May 9). Rising number of schools face penalties. Boston Globe. Retrieved May 11, 2006, from
    Ferguson, R. F. (2002). What doesn't meet the eye: Understanding and addressing racial disparities in high-achieving suburban schools. Cambridge, MA: John F. Kennedy School of Government. Retrieved February 8, 2005, from
    Fetterman, D. M. (1988, November). Qualitative approaches to evaluating education. Educational Researcher, 17–23.
    Figlio, D. N. (2003, November). Testing, crime, and punishment. Gainesville: University of Florida. Retrieved November 20, 2006, from
    Finchler, J. (2000). Testing Miss Malarkey. New York: Walker & Co.
    Finder, A. (2006, March 5). Schools avoid class ranking, vexing colleges. New York Times. Retrieved March 5, 2006, from
    Finley, M. T. (1995). Critical thinking skills as articulated in the instructional practices, objectives, and examination items of higher level secondary school courses. Unpublished doctoral dissertation, Widener University.
    Finn, C. E., Jr. (1991). We must take charge: Our schools and our future. New York: The Free Press.
    Finn, J. D., Pannozzo, G. M., & Achilles, C. M. (2003). The “whys” of class size: Student behavior in small classes. Review of Educational Research, 73(3), 321–368.
    Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher motivation? Lessons from Chicago's low-performing schools. American Educational Research Journal, 44(3), 594–629.
    Firestone, W A., Monflis, L. F., Hayes, M., Polovsky, T., Martinez, M. C., & Hicks, J. E. (2004). The principal, test preparation, and educational reform. In W A.Firestone, R. YSchorr, & L. F.Monfils (Eds.). The Ambiguity of Teaching to the Test; Standards, Assessment, and Educational Reform (pp. 91–112). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
    Firestone, W A., Monfils, L. F., Schorr, R. Y, Hicks, J. E., & Martinez, M. C. (2004). Pressure and support. In W AFirestone, R. YSchorr, & L. F.Monfils (Eds.), The ambiguity of teaching to the test: Standards, assessment, and educational reform (pp. 63–69). Mahwah, NJ: Lawrence Erlbaum Associates.
    Fisk, E. B. (1988, February 17). Standardized test scores: Voodoo statistics?New York Times, p. B9.
    Flanagan, J. D., Shaycroft, J., Gorham, M., Orr, W., Goldberg, D., & Goldberg, I. (1962). Design for a study of American youth. Boston: Houghton Mifflin.
    Fleischman, H. L., & Williams, L. (1996). An introduction to program evaluation for classroom teachers. Arlington, VA: Development Associates. Retrieved August 22, 2005, from
    Flood, P. H. (2004). It's test day, Tiger Turcotte. Minneapolis: Carolrhoda.
    Flores, B. B., & Clark, E. R. (2005). The centurion: Standards and high-states testing as gatekeepers for bilingual teacher candidates in the new century. In A.Valenzuela (Ed.), Leaving children behind: How “Texas-style” accountability fails Latino youth (pp. 225–248). Albany, NY: State University of New York Press.
    Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101, 171–191.
    Fortier, J. (1993). The Wisconsin Road Test as an empirical example of a large-scale, high-stakes, authentic performance assessment [Electronic version]. OnWEAC, Newsletter of the Wisconsin Education Association Council. Madison, WI: Wisconsin Department of Public Instruction. Retrieved September 21, 2007, from
    Fouratt, S., & Owen, C. (2004). CTP 4, Comprehensive testing program 4 [Technical Report]. Princeton, NJ: Educational Testing Service.
    Frahm, R. A. (2006, April). Who's really fit to teach?Hartford Courant. Retrieved April 5, 2006, from
    Frahm, R. A. (2006, November). Classroom discrepancy: Districts that face toughest challenges often hire least experienced teachers. Hartford Courant. Retrieved November 3, 2006, from
    Frankenberg, E. (2006). The segregation of American teachers. Cambridge, MA: The Civil Rights Project, Harvard University.
    Frary, R. B., Ross, L. H., & Weber, L. J. (1993, Fall). Testing and grading practices and options of secondary teachers of academic subjects: Implications for instruction in measurement. Educational Measurement: Issues and Practice. 12, 23–26.
    Freedle, R. O. (2002). Correcting the SAT's ethnic and social-class bias: A method for reestimating SAT scores [Electronic version]. Harvard Educational Review, 72 (3), 1–43. Retrieved June 21, 2005, from
    Freedle, R. O., & Kostin, I. (1997). Predicting black and white differential functioning in verbal analogy performance. Intelligence, 24, 417–444.
    Fuchs, L. S., & Fuchs, D. (2002). Mathematical problem-solving profiles of students with mathematics disabilities with and without comorbid reading disabilities. Journal of Learning Disabilities, 35(6), 563–573.
    Futernick, K. (2007). A possible dream: Retaining California teachers so all students learn. Sacramento: The Center for the Future of Teaching and Learning, California State University, Office of the Chancellor.
    Gadbury-Amyot, C. C., Kin, J., Mills, G. E., Noble, E., & Overman, P. R. (2003). Validity and reliability of portfolio assessments of competency in a baccalaureate dental hygiene program. Journal of Dental Education, 67(9), 991–1002.
    Gaetano, C. (2006, August 31). General education teachers face special education realities. East Brunswick Sentinel. Retrieved February 27, 2007, from
    Gaffney, J. S., & Zaimi, E. (2003, November 14). Grade retention and special education: A call for a transparent system of accountability. Paper presented at the Conference of the Teacher Education Division, Biloxi, MS.
    Gage, N. L. (Ed.). (1967). Handbook of research on teaching. A project of the American Educational Research Association. Chicago: Rand McNally.
    Galton, F. (1883/1919). Inquiries in human faculty and its development. London: J. M. Dent & Sons.
    Galton, F. (2001). Hereditary genius: An inquiry into its laws and consequences. Honolulu, HI: University of Hawaii Press. (Original work published 1869)
    Gardner, D. (2007). Confronting the achievement gap. Phi Delta Kappan, 88(7), 542–546.
    Gardner, H. (1999). Intelligence reframed: Multiple intelligence for the 21st century. New York: Basic Books.
    Gardner, H. (2005, September 14). Beyond the herd mentality: The minds that we truly need in the future. Education Week, 25(3), p. 44.
    Garnaut, J. (2007, May 21). Best teachers get top marks from study. Sydney Morning Herald (Australia). Retrieved May, 21, 2007, from
    Gayler, K., & Kober, N. (2004). Pay now or pay later: The hidden costs of high school exit exams. Washington, DC: Center on Educational Policy.
    Georgia Association of School Psychologists (2003). The use of high-stakes testing [Position statement]. Stone Mountain, GA: Author. Retrieved July 7, 2005, from
    Gershberg, A. I., & Hamilton, D. (2007, February 5). Bush's double standard on race in schools. Christian Science Monitor. Retrieved February 5, 2007, from
    Gesell, A. L., & Thompson, H. (1929). Learning and growth in identical infant twins: An experimental study by the methodology of co-twin control. Journal of Genetic Psychology, Monographs, 6, 1–124.
    Gewertz, C. (2007, January 10). Remediation for exit-exam failure proves daunting. Education Week, 26(18), p. 10.
    Gilbert, A. (2005). Teachers leave grading up to the computer. CNET Networks. Retrieved May 1, 2005, from
    Gill, B., Zimmer, R., Christman, J., & Blanc, S. (2007). Student achievement in privately managed and district-managed schools in Philadelphia since the state takeover (Research Brief RB-9239-ANF/WPF/SDP). Santa Monica, CA: Rand Corporation. Retrieved February 2, 2007, from
    Glaser, R. (1963). Instructional technology and the measurement of learning outcomes: Some questions. American Psychologist, 18, 519–521.
    Glickman, J., & Babyak, S. (2006). The toolbox revisited: Paths to degree completion from high school through college. Retrieved March 1, 2006, from
    Glod, M. (2006, October 27). Closing the gap, child by child. Washington Post. Retrieved March 10, 2007, from
    Glod, M. (2007, February 1). Virginia is urged to obey “No Child” on reading test. Washington Post. Retrieved February 2, 2007, from
    Goddard, T. (2005, February, 9). AIMS testing and special education [Opinion paper]. Office of the Attorney General, State of Arizona, 105–002 (R04–037). Retrieved September 21, 2007, from
    Goldhaber, D. D. (2006, April). Everyone's doing it, but what does teacher testing tell us about teacher effectiveness? Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
    Goldhaber, D. D., & Anthony, E. (2004, March). Teacher quality: Can it be assessed? Paper presented during the annual meeting of the American Education Finance Association, Salt Lake City, UT.
    Goleman, D. (2006, September). The socially intelligent leader. Educational Leadership, 64(1), 76–81.
    Gonzalez, E. J., & Kennedy, A. (2007, April). Comparing three models to obtain scores for PIRLS. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Good, R. H., Kaminski, R. A., Moats, L. C., Laimon, D., Smith, S., & Dill, S. (2002/2003). DIBELS: Dynamic indicators of basic early literacy skills, sixth edition. Longmont, CO: Sopris West.
    Good, R. H., Simmons, D. C., & Kame'enui, E.J. (2001). The importance of decision-making utility of a continuum of fluency-based indicators of foundational reading skills for third-grade high-stakes outcomes. Scientific Studies of Reading, 5(3), 257–288.
    Good, T. L., & Brophy, J. (1995). Contemporary educational psychology (
    5th ed.
    ). New York: Longman.
    Goode, E. (2002, March 12). The uneasy fit of the precocious and the average child. New York Times, pp. 1, 2.
    Goodkin, S., & Gold, D. G. (2007, August 27). The gifted children left behind. Washington Post. Retrieved August 27, 2007, from
    Goodnough, A. (2002, September 21). Teachers dig deeper to fill gap in supplies. New York Times. Retrieved November 19, 2006, from
    Gootman, E. (2006, October 19). Those preschoolers are looking older. New York Times, p. A 24.
    GordonE. E. (1995). Musical aptitude profile [1995 Revision]. Chicago: GIA Publications.
    Gould, S. J. (1996). The mismeasure of man. New York: W W. Norton.
    Graham, S., & Malcolm, K. K. (2001). [Review of the Oral and written language scales listening comprehension and oral expression] In B. S.Plake & J. C.Impara (Eds.), The fourteenth mental measurements yearbook (pp. 860–864). Lincoln, NE: Buros Institute of Mental Measurements.
    Graham, T. (2003). [Test review of the STAR early literacy (r)]. From B. S.Plake & J. C.Impara (Eds.), The fifteenth mental measurements yearbook [Electronic version]. Retrieved July 2, 2005, from the Buros Institute's Test Reviews Online Web site:
    Graue, E., Hatch, K., Rao, K., & Oen, D. (2007). The wisdom of class-size reduction. American Educational Research Journal, 44(3), 670–700.
    Grazer, B. (Producer), & McCulloch, B. (Director). (2003). Stealing Harvard. [Motion picture] United States: Imagine Entertainment.
    Green, E. (2007, August 16). Student backlash brews against untimed tests. New York Sun. Retrieved August 18, 2007, from
    Greene, A. H., & Melton, G. D. (2007, August 16). Teaching with the test, not to the test. Education Week, 26(45), p. 30.
    Gronlund, N. E. (1974). Determining accountability for classroom instruction [A title in the Current Topics in Classroom Instruction series]. New York: Macmillan.
    Grossman, K. N. (2005, April 26). No early dismissals for underperforming CPS tutors. Chicago Sun Times. Retrieved June 23, 2005, from
    Guba, E. G., & Lincoln, Y S. (1988). Do inquiry paradigms imply inquiry methodologies? In D. M.Fetterman (Ed.), Qualitative approaches to evaluation in education: The silent scientific revolution. New York: Praeger.
    Guernsey, L. (2005). None of the above: The real world adds ambiguity to math and science questions. New York Times, pp. 4A, A18–19.
    Guilford, J. P. (1946). New standards for test evaluation. Educational and Psychological Measurement, 6, 427–439.
    Guilford, J. P. (1975). Characteristics of creativity (Report No. 1370.152). Springfield, IL: State of Illinois, Office of the Superintendent of Public Instruction, Department for Exceptional Children.
    Guilford, J. P. (1988). Some changes in the structure-of-intellect model. Educational and Psychological Measurement, 48(1), 1–4.
    Gulliksen, H. (1950). Theory of mental tests. New York: John Wiley & Sons.
    Gupta, R. (2005, December 27). Music teachers' group pitches test to balance scales. Palm Beach Post. Retrieved January 3, 2006, from
    Guskey, T. R. (2002). How's my kid doing? A parent's guide to grades, marks, and report cards. San Francisco: Jossey-Bass.
    Guskey, T. R. (2005, November). Mapping the road to proficiency. Educational Leadership, 63(3), 32–38.
    Guskey, T. R. (2007). Multiple sources of evidence: An analysis of stakeholders' perceptions of various indicators of student learning. Educational Measurement: Lssues and Practice, 26(1), 19–27.
    Guy, S. (2007, February 28). Sometimes a bright idea just clicks. Chicago Sun Times. Retrieved February 28, 2007, from,CST-FIN-ec0128.articleprint
    Hacker, H. K., & Parks, S. (2005, February 16). Some states getting tough on cheating. Dallas Morning News. Retrieved February 16, 2005, from
    Hacker, H. K., & Stutz, T. (2006, June 12). Incentive pay enters classroom. Dallas Morning News. Retrieved, June 13, 2006, from
    Haft, S., Witt, P. J., & Thomas, T. (Producers), & Weir, P. (Director) (1989). Dead poets society [Motion picture]. United States: Buena Vista.
    Hall, D. (2005, June). Getting honest about grad rates: How states play the numbers and students lose. The Educational Trust. Retrieved June 24, 2005, from
    Ham, B. D. (2003). The effects of divorce on the academic achievement of high school seniors. Journal of Divorce & Remarriage, 38(3), 167–185.
    Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L.Linn (Ed.), Educational measurement (pp. 147–200). New York: Macmillan.
    Hambleton, R. K. (2000). Emergence of item analysis modeling in instrument development and data analysis. Medical Care, 38(9) (Supplement II), 60–65.
    Hambleton, R. K. (2004, June 24–25). Traditional and modern approaches to outcomes measurement. Paper presented at the 2004 conference of the National Institute of Cancer and the Drug Information Association. Bethesda, MD.
    Hammill, D. D. (1998). Detroit test of learning aptitude,
    4th Ed.
    Austin, TX: Pro-Ed, Pearson Measurement.
    Hamre, B. K., & Pianta, R. C. (2005). Can instructional and emotional support in the first-grade classroom make a difference for children at risk for school failure?Child Development, 76(5), 949–967.
    Hancox, R. J., Milne, B. J., & Poulton, R. (2005). Association of television viewing during childhood with poor educational achievement. Archives of Pediatrics and Adolescent Medicine, 159(7), 614–618.
    Haney, W. (2000). The myth of the Texas miracle in education [Electronic version]. Education Policy Analysis Archives, 8(41). Retrieved January 24, 2005, from
    Haney, W M. (2006, September). Evidence on education under NCLB (and how Florida boosted NAEP scores and reduced the race gap). Paper presented at the Hechinger Institute's “Broad Seminar for K-12 Reporters.” Teachers College, Columbia University, New York.
    Hanushek, E. A., & Rivkin, S. G. (2006, October). School quality and the Black-White achievement gap (NBER Working Paper No. W12651). Palo Alto, CA: Hoover Institution, Stanford University. Retrieved February 24, 2007, from
    Harcourt Assessment (2001). Metropolitan achievement test, eighth edition. San Antonio, TX: Author.
    Harcourt Assessment (2003). Stanford achievement test,
    tenth edition.
    San Antonio, TX: Author.
    Harcourt Brace Educational Measurement. (1996). Stanford diagnostic mathematics test, fourth edition. San Antonio, TX: Author.
    Harman, A. E. (2001). National board for professional teaching standards' national teacher certification. Washington, DC: Eric Clearinghouse on Teaching and Teacher Education. (ERIC Document No. ED460126)
    Harris, D. N., & Sass, T. R. (2007, April). Teacher training, teacher quality and student achievement. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Harrison, S. (2005, June 28). Midyear promotions: Half flunk FCAT again. Miami Herald. Retrieved June 29, 2005, from
    Hart, B., & Risley, T. R. (1995). Meaningful differences in the everyday experience of young American children. Baltimore: Paul H. Brooks.
    Hart, B., & Risley, T. R. (1999). The social world of children learning to talk. Baltimore: Paul H. Brooks.
    Haynes, W O., & Shapiro, D. A. (1995). [Review of the Communications abilities diagnostic test]. In J. C.Conoley & J. C.Impara (Eds.), The twelfth mental measurements yearbook (pp. 214–219). Lincoln, NE: Buros Institute of Mental Measurements.
    Heck, R. H., Larson, T. J., & Marcoulides, G. A. (1990). Instructional leadership and school achievement: Validation of a causal model. Education Administration Quarterly, 26(2), 94–125.
    Hehir, T. (2007). Confronting ableism. Educational Leadership, 64(5), 9–14.
    Herman, E. (2002, Winter). The paradoxical rationalization of modern adoption: Social and economic aspects of adoption. Journal of Social History. Retrieved January 29, 2005, from
    Herman, E. (2005, June 22). The adoption history project: Arnold Gesell. Department of History, University of Oregon. Retrieved August 13, 2005, from
    Herman, J. L., & Baker, E. L. (2005, November). Making benchmark testing work. Educational Leadership, 63(3), 48–54.
    Herrnstein, R. J., & Murray, C. (1994). The bell curve: Intelligence and class structure in American life. New York: Simon & Schuster.
    Hershberg, T., & Lea-Kruger, B. (2007, April 11). Not performance pay alone: Teacher incentives must be matched by systemwide change. Education Week, 26(32), pp. 48, 35.
    Hershberg, T., Simon, V. A., & Lea-Kruger, B. (2004, February). Measuring what matters: How value-added assessment can be used to drive learning gains [Electronic version]. American School Board Journal, 191(2). Retrieved August 24, 2005, from
    Hess, A. K. (2001). [Review of the Conners' rating scales-revised]. In B. S.Plake & J. C.Impara (Eds.), The fourteenth mental measurement yearbook (pp. 331–337). Lincoln, NE: Buros Institute of Mental Measurements.
    HigginsL. (2007, April 29). Schools hope data can boost scores: It can also help spot flaws in curriculum. Detroit Free Press. Retrieved May 2, 2007, from
    Higgins, L. T., & Zheng, M. (2002). An introduction to Chinese psychology: Its historical roots until the present day. Journal of Psychology, 136(2), 225–239.
    Hill, R. K., & DePascale, C. A. (2003). Reliability of No Child Left Behind accountability designs. Portsmouth, NH: National Center for the Improvement of Educational Assessments.
    Hinds, D. A., Stuve, L. L., Nilsen, G. B., Halperin, E., Eskin, E., Ballinger, et al. (2005). Whole-genome patterns of common DNA variation in three human populations. Science, 307, 1072–1079.
    Hirsch, Jr., E. D. (2006, April 26). Reading-comprehension skills? What are they really?Education Week, 25(33), p. 52.
    Hoff, D. J. (2007, June 8). State tests show gains since NCLB. Education Week, 26(39), pp. 1, 20.
    Hoff, D. J., & Manzo, K. K. (2007, March 14). Bush claims about NCLB questioned. Education Week, 26(21), pp. 1, 26, 27.
    Holbrook, R. G. (2003). Impact of selected noncurricular variables on regular education student achievement as measured by the 2001–2002 reading and mathematics PSSA scores. Unpublished doctoral dissertation, Widener University.
    Holbrook, R. G., & Wright, R. J. (2004, February). Non-curricular factors related to success on high-stakes tests. Paper presented at the annual meeting of the Eastern Educational Research Association, Clearwater, FL.
    Honawar, V. (2007, April). Alternative-certification programs multiply. Education Week, 26(33), p. 16.
    Honawar, V (2007, January). Bonuses for NBPTS-certified teachers at risk in South Carolina. Education Week, 28(20), pp. 5, 20.
    Hopkins, K. D. (1998). Educational and psychological measurement and evaluation (
    8th ed.
    ). Boston: Allyn & Bacon.
    Houston teachers asked to give back bonuses. (2007, March 10). Dallas Morning News. Retrieved March 12, 2007, from
    Hresko, W P., Reid, K. D., & Hammill, D. D. (1999). Test of early language development, third edition. Austin, TX: Pro-Ed.
    Hu, W (2007, July 6). Schools move toward following students' yearly progress on tests. New York Times, p. C10.
    Huntsinger, C. (1999, April). Does K-5 homework mean higher test scores?American Teacher. Retrieved October 16, 2005, from
    IDEA (1986). Individuals With Disabilities Education Act Amendment: Preschool and Infant/Toddler Programs, (P.L. 99–457, 1986, now part c).
    IDEA (1997). Individuals With Disabilities Education Act (P.L. 105–17, 1997 [20 U.S.C. 1401 et seq.]).
    IDEIA (2004). Individuals With Disabilities Educational Improvement Act of 2004 (118 Stat. 2647, H.R. 1350, 108th Congress, No. 446).
    IDEIA 2004 resources (2004). Ed. gov. technical assistance and dissemination network. Office of Special Education Programs. Retrieved June 16, 2005, from
    Illinois Association of Directors of Title I. (2006). 2006 accountability workbook changes. Springfield: Author. Retrieved March 20, 2007, from,31
    Impara, J. C. (1996). Assessment skills of counselors, principals, and teachers. ERIC Digest. Retrieved July 15, 2005, from
    Improving America's Schools Act (P.L. 103–382, 1994).
    Institute for Education Sciences (2006, October 18). State profiles: The Nation's Report Card. Washington, DC: Institute for Education Sciences, National Center for Educational Statistics, U.S. Department of Education. Retrieved November 22, 2006, from
    IrvinA. (2003). The influence of selected educational and teacher demographic variables on fifth grade reading and writing scores for the 2000 and 2001 Delaware Student Testing Program. Unpublished doctoral dissertation, Widener University.
    Jacobson, L. (2006, September 27). Teacher-pay incentives popular but unproven. Education Week, 26(5), pp. 1, 20.
    Jacobson, L. (2007, April 4). Study casts doubt on value of “highly qualified” status. Education Week, 26(31), p. 13.
    James, F. (2004, December 5). Response to intervention in Individuals With Disabilities Education Act (IDEA), 2004. Newark, DE: International Reading Association. Retrieved November 28, 2006, from http://www.Reading.ordownloads/resources/IDEA_RTI_report.pdf
    Jehlen, A. (2007, April). Testing: How the sausage is made. NEA Today. Retrieved April 15, 2007, from
    Jennings, J., & Rentner, D. S. (2006). Ten big effects of the No Child Left Behind Act on public schools. Washington, DC: Center on Educational Policy. Retrieved November 9, 2006, from
    Jennings, K. E. (2003). Test review of the Brown attention-deficit disorder scales for children and adolescents. From B. S.Plake & J. C.Impara (Eds.), The fifteenth mental measurements yearbook [Electronic version]. Retrieved June 29, 2005, from the Buros Institute's Test Reviews Online Web site:
    Jensen, A. R. (1999). The g factor: The science of mental ability [Electronic edition]. Psycoloquy, 10(2). Retrieved February 15, 2005, from
    Jimerson, L. (2006, September 5). The Hobbit effect: Why small works in public schools. Arlington, VA: The Rural School and Community Trust. Retrieved November 20, 2006, from
    Jimerson, S. R. (2001a). Meta-analysis of grade retention research: Implications for practice in the 21st century. School Psychology Review, 30, 313–330.
    Jimerson, S. R. (2001b). Synthesis of grade retention research: Looking backward and moving forward. California School Psychologist, 6, 47–59.
    Jitendra, A. K., Griffin, C. C., Haria, P., Leh, J., Adams, A, & Kaduvettoor, A. (2007). A comparison of single and multiple strategy instruction on third-grade students' mathematical problem solving. Journal of Educational Psychology, 99(1), 115–127.
    Johnson, D. J., Thurlow, M., Cosio, A., & Bremer, C. (2005). Diploma options for students with disabilities [Electronic version]. Information Brief4(1). Retrieved June 26, 2005, from
    Johnson, D. R., & Thurlow, M. L. (2003). A national study on graduation requirements and diploma options for youth with disabilities (NCEO Technical Report No. 36). Minneapolis: University of Minnesota, National Center on Educational Outcomes. Retrieved June 26, 2005, from
    Johnson, E. B., & Johnson, A. V (1990). Communications abilities diagnostic test. Chicago: Riverside.
    Johnson, K. A. (2000). Merit pay for teachers: A meritorious concept or not? Unpublished manuscript, Center for Education, Widener University.
    Johnson, L. B. (1966). Public papers of the presidents of the United States, Book 1, 1965. Washington, DC: U.S. Government Printing Office.
    Johnson, S. M., Birkeland, S. E., & Peske, H. G. (2005, September). A difficult balance: Incentives & quality control in alternative certification programs. Cambridge, MA: Project on the Next Generation of Teachers, Harvard Graduate School of Education.
    Joint Committee on Testing Practices. (2005). Code of fair testing practices in education. Educational Measurement: Issues and Practice. 24(1), 23–27.
    Joireman, J., & Abbott, M. (2004). Structural equation models assessing relationships among student activities, ethnicity, poverty, parent's education, and academic achievement (Technical Report # 6). Seattle, WA: Washington School Research Center. Retrieved August 21, 2005, from
    Joseph, R. (2000). Neuropsychiatry, neuropsychology, and clinical neuroscience. San Diego, CA: Academic Press. Retrieved March 27, 2007, from
    Kaase, K., & Dulaney, C. (2005, May). The impact of mobility on educational achievement: A review of the literature (E & R Report No. 4.39). Research Watch, Wake County Public Schools, NC. Retrieved November 29, 2006, from
    Kalish, R. A. (1958). An experimental evaluation of the open book examination. Journal of Educational Psychology, 49(4), 200–204.
    Kame'enui, E. J. (2002, March 5). The teaching of reading: Beyond vulgar dichotomies to the science of causality. The White House Conference on Preparing Tomorrow's Teachers. Retrieved April 3, 2006, from
    Kamphaus, R. W., & Frick, P. J. (2002). Clinical assessment of child and adolescent personality and behavior. Boston: Allyn and Bacon.
    Kanada, M., Kreiman, C., & Nichols, P. D. (2007, April). Effects of scoring environment on rater reliability, score validity and generalizability: A comparison of standup local scoring, online distributed scoring, and online local scoring. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Kane, T. J., Rockoff, J. E., & Staiger, D. O. (2006, April). What does certification tell us about teacher effectiveness? Evidence from New York City (Working Paper 12155). Cambridge, MA: National Bureau of Economic Research.
    Kane, T. J., & Staiger, D. O. (2002). Volatility in school test scores: Implications for test-based accountability systems. In DianeRavitch (Ed.), Brookings papers on Education Policy 2002 [Based on BEEP Conference on Accountability and Its Consequences for Students: Are Children Hurt or Helped by Standards-Based Reforms? May 15–16, 2001]. Washington, DC: The Brookings Institution.
    Kannapel, P. J., & Clements, S. K. (2005, February). Inside the black box of high-performing high-poverty schools. Lexington, KY: Prichard Committee for Academic Excellence. Retrieved July 19, 2005, from
    Kaplan, J. (Ed.) (1992). Familiar quotations [John Bartlett] (
    17th ed.
    ) (p. 772). Boston: Little, Brown and Company.
    Karantonis, A., & Sireci, S. C. (2006). The bookmark standard-setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4–12.
    Karweit, N. (1984). Extending the school year and day. Eugene, OR: Eric Clearinghouse on Educational Management. Retrieved July 18, 2005, from
    Kasindorf, M., & El Nasser, H. (2001, March 12). Impact of census's race data debated. USA Today. Retrieved February 18, 2005, from
    Kaznowski, K. (2004). Slow learners: Are educators leaving them behind?NASSP Bulletin, 88. Retrieved March 26, 2007, from
    Keiger, D. (2000, April). What brilliant kids are hungering for. Johns Hopkins. Retrieved May 10, 2005, from
    Keller, B. (2004, May 19). Schools employing online tests to screen prospects. Education Week, pp. 1, 22.
    Keller, B. (2004, November 17). Pennsylvania outlines teacher-test alternatives. Education Week, 24(12), p. 18.
    Keller, B. (2007, August 15). The National Board: Challenged by success?Education Week, 26(45), 1, 16.
    Keller, H. R (2001). [Review of the Early childhood attention-deficit disorders evaluation scale]. In B. S.Plake & J. C.Impara (Eds), The fourteenth mental measurements yearbook (pp. 442–446). Lincoln, NE: Buros Institute of Mental Measurements.
    Kellow, J. T., & Willson, V L. (2001). Consequences of (mis)use of the Texas Assessment of Academic Skills (TAAS) for high-stakes decisions: Comment on Haney and the Texas miracle in education [Electronic version]. Practical Assessment, Research & Evaluation, 7(24). Retrieved January 24, 2005, from
    Kelly, C., & Finnigan, K. (2003). Organizational context colors teacher expectancy. Educational Administration Quarterly, 39(5), 603–634.
    Kelly, T. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24.
    Kelly, T. L. (1947). Fundamentals of statistics. Cambridge, MA: Harvard University Press.
    Kelly, S., & Monczunski, L. (2007). Overcoming the volatility in school-level gain scores: A new approach to identifying value added with cross-sectional data. Educational Researcher, 36(5), 279–287.
    Kingsbury, G. G., & Wollack, J. A. (2001). [Review of the Key math-revised] In B. S.Plake & J. C.Impara (Eds.), The fourteenth mental measurements yearbook (pp. 637–641). Lincoln, NE: Buros Institute of Mental Measurements.
    Kinzie, S. (2007, May 1). At first they flirt, then colleges crush: Rejection rough on students and schools. Washington Post. Retrieved May 3, 2007, from
    Klein, A. (2007, February 5). Researchers see college benefits for students who took AP courses. Education Week, 26(22), p. 7.
    Knight, H. (2005, December 12). Offering incentives boosts attendance and test scores. San Francisco Chronicle. Retrieved December 15, 2005, from
    Kobrin, J. L., Deng, H., & Shaw, E. J. (2007). Does quality count? The relationship between length of response and scores on the SAT Essay. Journal of Applied Testing Technology, 8(1), 1–15. Retrieved March 2, 2007, from
    Kohn, A. (2002, November 8). The dangerous myth of grade inflation. Chronicle of Higher Education: Chronicle Review. Retrieved July 16, 2004, from
    Kohn, A. (2006). The homework myth. Why our kidsget too much of a bad thing. Cambridge, MA: Da Capo Press.
    Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13(5), 5–16.
    Krathwohl, D. R., Bloom, B. S., & Masia, B. B. (1964). Taxonomy of educational objectives: Handbook II: Affective domain. New York: David McKay Co.
    Kristoback, J., & Wright, R. J. (2001, February). The success of test preparation on the scores from the fifth grade level Pennsylvania System of School Assessment (PSSA). Paper presented at the annual meeting of the Eastern Educational Research Association, Hilton Head, SC.
    Kroeze, D. J. (2007, April). Is a high-performing district's performance high enough for NCLB? Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
    Krueger, A. B. (2002). Understanding the magnitude and effect of class size on student achievement. In L.Mishel and R.Rothstein (Eds.), The class size debate (pp. 7–35). Washington, DC: Economic Policy Institute.
    Krueger, A. B. (2003). Economic considerations and class size. Economic Journal, 113, F34–F63.
    Krueger, A. B., & Whitmore, D. M. (2001). Would smaller classes help close the Black-White gap? (Working Paper #451). Princeton, NJ: Princeton University Industrial Relations Section.
    Kuder, G. F., & Richardson, M. W (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–161.
    Kulik, J. A., & Kulik, C. L. C. (1992). Meta-analysis findings on grouping programs. Gifted Child Quarterly, 36(2), 73–77.
    Kusimo, P. A (1999). Rural African Americans and education: The legacy of the Brown decision. Clearing House on Rural Education and Small Schools (ERIC Document Reproduction Service No. ED425050). Retrieved October 17, 2006, from
    Laczko-Kerr, I., & Berliner, D. C. (2002). The effectiveness of “Teach for America” and other under-certified teachers on student academic achievement: A case of harmful public policy [Electronic version]. Education Policy Archives, 10(37). Retrieved July 9, 2005, from
    Laitsch, D. (2006). Heterogeneous grouping in advanced mathematics classes. American Educational Research Journal, 43(1), 105–136.
    Laitsch, D. (2007). Educator community and elementary student performance [Research brief]. Association for Supervision and Curriculum Development. Retrieved February 27, 2007, from
    Landgraf, K. M. (2005). Testing: Snapshots should not lead to snap judgments [Issue paper]. Retrieved July 15, 2005, from
    Lane, S. (2004). Validity of high-stakes assessment: Are students engaged in complex thinking?Educational Measurement: Issues and Practice, 23(3), 6–14.
    Lankford, H. S., Loeb, S., & Wyckoff, J. (2002). Teacher sorting and the plight of urban schools: A descriptive analysis. Educational Evaluation and Policy Analysis, 24(1), 37–62.
    Law, J. G., Jr. (2001). [Test review of the Scalesfor diagnosing attention-deficit-hyper activity disorder]. From B. S.Plake & J. C.Impara (Eds.), The fifteenth mental measurements yearbook [Electronic version]. Retrieved June 29, 2005, from the Buros Institute's Test Reviews Online Web site:
    Law, L. (Producer), & Méndez, R. (Director). (1988). Stand and deliver [Motion picture]. United States: Warner Studios.
    Lawler, P. (1993). A longitudinal study of women's career choices: Twenty-five years later. Pre-convention workshop from the annual meeting of the American Association of University Women. Reprinted later in Gender issues in the classroom and on campus: Focus on the twenty-first century (pp. 187–192). Washington, D.C.: American Association of University Women.
    Leach, J. M., Scarborough, H. S., & Rescorla, L. (2003). Late-emerging reading disabilities. Journal of Educational Psychology, 95(2), 211–234.
    Leahy, S., Lyon, C., Thompson, M., & Wiliam, D. (2005, November). Assessment minute, day by day. Educational Leadership, 63(3), 19–24.
    Lederman, D. (2006). Krist views efforts to clamp down on senioritis with skepticism. Faculty in the news. Stanford, CA: School of Education, Stanford University. Retrieved November 18, 2006, from
    Lee, J. (2006). Tracking achievement gaps and assessing the impact of NCLB on the gaps: An in-depth look into national and state reading and math outcome trends. Cambridge, MA: The Civil Rights Project at Harvard University.
    Lee, J., & Fox, J. (2007, April). Minority students at risk for low and high performance: A comparison of NAEP achievement gaps to special and gifted education placement gaps. A paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Lehmann, I. J., Nagy, P., & Poteat, M. G. (1998). [Review of the Stanford diagnostic mathematics test, fourth edition]. In J. C.Impara & B. S.Plake (Eds.), The thirteenth mental measurements yearbook (pp. 930–938). Lincoln, NE: Buros Institute of Mental Measurements.
    Leischer, J. (2005, January 5). State standards fail to meet NCLB challenge [Research report]. Thomas B. Fordham Foundation. Retrieved January 6, 2005, from
    Lemann, N. (1999). The big test. New York: Farrar, Straus and Giroux.
    Leslie, M. (2000, July/August). The vexing legacy of Lewis Terman. Stanford. Retrieved January 30, 2005, from
    Levitt, S. D., & Dubner, S.J. (2005). Freakonomics. New York: HarperCollins.
    Lewin, T. (2007, June 8). States found to vary widely on education. New York Times, p. A20.
    Lichtenstein, R. (2002). Learning disabilities criteria: Recommendations for change in IDEA reauthorization [Electronic version]. NASP Communiqué, 30(6). Retrieved July 15, 2004, from
    Lieberman, M. (2000). Merit pay can't provide the incentives for improvement [Weekly column]. Washington, DC: Education Policy Institute. Retrieved July 15, 2005, from
    Lieberman, N. (2004). Admissions. New York: Time Warner Bookmark.
    Lin, W. V. (2001). Parenting beliefs regarding young children perceived as having or not having inattention and/or hyperactivity-impulsivity behaviors. Unpublished doctoral dissertation, University of South Dakota, 2001. ProQuest publication number AAT 3007070.
    Linn, R. (2003). Accountability: Responsibility and reasonable expectations [AERA presidential address]. Educational Researcher, 31(7), 3–13.
    Linn, R. L. (2007, April-a). Approaches to educational accountability. Paper presented during the annual meeting of the National Council on Measurement in Education, Chicago, IL.
    Linn, R. L. (2007, April-b). Needed modifications of NCLB. Paper presented during the annual meeting of the National Council on Measurement in Education, Chicago, IL.
    Linn, R., & Gronlund, N. (2000). Measurement and assessment in teaching. San Francisco: Prentice Hall.
    Linn, R. L., & Haug, C. (2002, April). Stability of school building accountability scores and gains [Center for the Study of Evaluation Report 561]. Los Angeles, CA: Center for the Study of Evaluation, Graduate School of Education & Information Sciences, University of California, Los Angeles.
    Linn, R. L., & Miller, M. D. (2005). Measurement and assessment in teaching (
    9th ed.
    ). Upper Saddle River, NJ: Pearson, Merrill, Prentice Hall.
    Lipka, S. (2007, June 12). Elite company. Chronicle of Higher Education, 53(42), pp. A31–35.
    Liu, J., Allspach, J. R., Feigenbaum, M., Oh, H., & Burton, N. (2004). A study of fatigue effects from the New SAT [College Board Research Report No. ETS RR-04–46]. New York: College Entrance Examination Board.
    Lizama, J. A. (2004, October 5). Is the tide turning on class rank?Richmond Times Dispatch. Retrieved October 7, 2004, from
    Lloyd, C. (2005, March 27). How much is a school worth? Parents add test scores into home-purchase equations. San Francisco Chronicle. Retrieved July 15, 2005, from
    Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B., Vi-Nhuan, L., & Martinez, J. F. (2007). The sensitivity of value-added teacher effect estimates to different mathematics achievement tests. Journal of Educational Measurement, 44(1), 47–67.
    Logerfo, L. (2006). Climb every mountain. Education Next. Stanford, CA: Hoover Institution, Leland Stanford Junior University.
    Lonergan, D. (2006, March 27). Dover-Sherborn High School student handbook. Dover, MA. Retrieved April 13, 2006, from
    Long, J. S. (1997). Regression models for categorical and limited dependent variables. Advanced Quantitative Techniques: Volume 7 of the social science series. Thousand Oaks, CA: Sage.
    Longstaffe, J. A., & Bradfield, J. W B. (2005, May). A review of factors influencing the dissemination of the London Agreed Protocol for Teaching (LAPT): A confidence based marketing system [Unpublished policy statement]. University College, London. Retrieved November 26, 2006, from
    Lord, F. M. (1952). The relationship of the reliability of multiple-choice to the distribution of item difficulties. Psychometrika, 17(2), 181–194.
    Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
    Lou, L. (2007, March 28). District considers “grade bump” incentives for students who do well on state tests. San Diego Union Tribune. Retrieved June 26, 2007, from
    Lubienski, C., & Lubienski, S. T. (2006, January). Charter, private, public schools and academic achievement: New evidence from NAEP mathematics data. New York: National Center for the Study of Privatization in Education, Teachers College, Columbia University. Retrieved April 6, 2006, from
    Lunz, M. E., & Bashook, P. G. (2007, April). The impact of examiner communication ability on oral examination outcomes. Paper presented during the annual meeting of the National Council on Measurement in Education, Chicago, IL.
    Lyon, R. L. (1998, April 28). Overview of reading and literacy initiatives of the Child Development and Behavior branch of the National Institute of Child Health and Human Development, National Institutes of Health [Report to the House of Representatives, Committee on Labor and Human Resources]. Washington, DC. Retrieved July 1, 2005, from
    Maeroff, G. I. (1992). Reform comes home: Policies to encourage parental involvement in children's education. In C. E.Finn & T.Rebarber (Eds.), Education reform in the 90s (pp. 175–194). New York: Macmillan.
    Malcolm, K. K., & Schafer, W. D. (2005). [Review of the Comprehensive testing program 4]. In R. A.Spies & B. S.Plake (Eds.), The sixteenth mental measurements yearbook [Electronic version]. Retrieved November 8, 2006, from the Buros Institute's Test Reviews Online Web site:
    Manning, M. L. (2000). Child-centered middle schools. A position paper: Association for Childhood Education International. Childhood Education, 76(3), 154–159. (ERIC Journal No. EJ602130)
    Manzo, K. K. (2005, March 16). Social studies losing out to reading, math. Education Week, 24(27), pp. 1, 16.
    Marchant, G. J., & Paulson, S. E. (2005, January 21). The relationship of high school graduation exams to graduation rates and SAT scores [Electronic version]. Education Policy Analysis Archives, 13(6). Retrieved February 10, 2005, from
    Margolis, H., & Free, J. (2001). The consultant's corner: Computerized IEP programs: A guide for educational consultants. Journal of Educational and Psychological Consultation, 12(2), 171–178.
    Marion, S. F., & Pellegrino, J. W (2006). A validity framework for evaluating the technical quality of alternative assessments. Educational Measurement: Issues and Practice, 25(4), 47–57.
    Marion, S. F., & Sheinker, A. (1999). Issues and consequences for state-level minimum competency testing programs (Wyoming Report 1). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved February 5, 2005, from
    Markow, D., & Martin, S. (2005). The MetLife survey of the American teacher- Transitions and the role of supportive relationships. New York: Harris Interactive.
    Marrs, J. (2001). Rule by secrecy. New York: HarperCollins.
    Marsh, H. W, Trautwein, U., Lüdke, O., & Baumert, J. (2007). The big-fish-little-pond effect: Persistent negative effects of selective high schools on self-concept after graduation. American Educational Research Journal, 44(3), 631–669.
    Martin, M. O., Mullis, I. V. S., & Chrostowski, S. J. (2004). TIMSS 2003 technical report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. Retrieved September 6, 2005, from
    Maruti, S. S., Feskanich, D., Colditz, G. A, Frazier, A. L., Sampson, L. A., Michels, K. B., et al. (2005). Adult recall of adolescent diet: Reproducibility and comparison with maternal reporting. American Journal of Epidemiology, 161(1), 89–97.
    Marvel, J., Lyter, D. M., Peltola, P., Strizek, G. A., & Morton, B. A. (2007, January). Teacher attrition and mobility: Results from the 2004–2005 teacher follow-up survey [NCES 2007–307]. Washington, DC: National Center for Educational Statistics, U. S. Department of Education.
    Marzano, R. J. (2000). Transforming classroom grading. Alexandria, VA: Association for Supervision and Curriculum Development.
    Marzano, R. J., Pickering, D. J., & Pollock, J. E. (2001). Classroom instruction that works: Research-based strategies for increasing student achievement. Alexandria, VA: Association for Supervision and Curriculum Development.
    Massachusetts Department of Education. (2005, April). Grade retention in Massachusetts public schools: 2003–2004. Malden, MA: Author.
    Mathews, J. (1988). Escalante: The best teacher in America. New York: Holt.
    Mathews, J. (2005, June 14). Where some give credit, others say it's not due. Washington Post, p. A14.
    Maxwell, L. (2006, September 6). Massachusetts schools experiment with extra time. Education Week, 26(2), pp. 30, 33.
    Maxwell, L. A. (2007, February, 14). The other gap. Education Week, 26(23), pp. 25, 27–29.
    Mayer, R. E. (1999). Fifty years of creativity research. In R. J.Sternberg (Ed.), Handbook of creativity (pp. 449–460). Cambridge, UK: Cambridge University Press.
    McCarney, S. B., & Johnson, N. (1995). Early childhood attention-deficit disorder evaluation scale (ECADDES). Columbia, MO: Hawthorne Educational Services.
    McClure, C. T. (2007, August). Ability grouping and acceleration in gifted education. District Administration, 43(8), 24–25.
    McGill-Franzen, A., & Allington, R. (2006, June). Contamination of current accountability systems. Phi Delta Kappan, 87(10), 762–766.
    McGrew, K. S. (2003, November 28). Cattell-Horn-Carroll definition project. Institute of Applied Psychometrics. Retrieved March 7, 2005, from
    McNeil, S. (2004). The 1970s: Influences on this period. A hypertext history of instructional design. Retrieved January 8, 2005, from
    McTighe, J., & O'Connor, K. (2005, November). Seven practices for effective learning. Educational Leadership, 63(3), 10–17.
    Mednick, S. A. (1962). The associative basis of the creative process. Psychological Review, 69, 220–232.
    Meek, C. (2006). From the inside out: A look at the testing of special education students. Phi Delta Kappan, 88(4), p. 293–297.
    Meeker, R. J., & Weile, D. M. (1971). A school for the cities. Education and Urban Society, 3, 129–243.
    Mehring, T. (1995). Report card options for students with disabilities in general education. In T.Azwell & E.Schmar (Eds.), Report card on report cards: Alternatives to consider. Portsmouth, NH: Heinemann.
    Méndez, T. (2005, February 15). Changing school with the season. Christian Science Monitor. Retrieved February 16, 2005, from
    Messick, S. (1989). Validity. In R. L.Linn (Ed.), Educational measurement (
    3rd ed.
    ), (pp. 13–103). New York: American Council on Education and Macmillan.
    Meyer, C. A. (1992, May). What's the difference between authentic and performance assessment?Educational Leadership, 49(8), 39–42.
    Meyer, J. P. (2006, August 31). ItemQual 0.9.2 [Free software]. Harrisonburg, VA: Center for Assessment and Research Studies, James Madison University. Retrieved March 16, 2007, from
    Miller, B. J., Sundre, D. L., Setzer, C., & Zeng, X. (2007, April). Content validity: A comparison of two methods. Paper presented during the annual meeting of the national Council on Measurement in Education, Chicago, IL.
    Miller, D. (1995). [Review of the Kaufman brief intelligence test]. In J. C.Conoley & J. C.Impara (Eds.), The twelfth mental measurements yearbook (pp. 533–536). Lincoln, NE: Buros Institute of Mental Measurements.
    Miller, G. E. (2004). Analyzing the minority gap in achievement scores: Issues for states and federal government. Educational Measurement Issues and Practices, 22(3), 30–36.
    Miller, T. (2003). Essay assessment with latent semantic analysis. Journal of Educational Computing Research, 29(4), 495–512.
    Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In R. L.Linn (Ed.), Educational measurement (
    3rd ed.
    ) (pp. 335–366). New York: Macmillan.
    Millman, J., & Schalock, H. D. (1997). Beginnings and introduction. In J.Millman (Ed.), Grading teachers, grading schools. Thousand Oaks, CA: Corwin Press.
    Mindish, J. M. (2003). Predictions of grade 11 PSSA mathematics and reading scores. Unpublished doctoral dissertation, Widener University.
    Mitchell, T. (2005, August 19). Realistic FCAT practice is about to become available. Florida Times-Union. Retrieved August 22, 2005, from
    Moore, A. S. (2004, August 1). Trouble in the ranks: The dog-eat-dog race of elite colleges has high schools reconsidering class rank. New York Times, pp. 10–11.
    Morreale, S. P., & Suen, H. K. (2001). [Review of the Test of early language development, third edition]. In B. S.Plake & J. C.Impara (Eds.), The fourteenth mental measurements yearbook (pp. 1239–1242). Lincoln, NE: Buros Institute of Mental Measurements.
    Moskal, B. M. (2007). Scoring rubrics: What when and how? [Electronic version]. Practical Assessment, Research & Evaluation, 7(3). Retrieved April 23, 2007, from
    Murphy, G., & Likert, R. (1938). Public opinion and the individual: A psychological study of student attitudes on public questions with a retest five years later. New York: Harper Books.
    National Assessment of Educational Progress. (2004). The nation's report card. Washington, DC: National Center for Education Statistics. Retrieved July 7, 2005, from
    National Assessment of Educational Progress Authorization Act (P.L. 103–33, 1993).
    National Association for Gifted Children. (2005). Why we should advocate for gifted and talented children [Position statement]. Washington, DC: Author. Retrieved November 5, 2006, from
    National Association for the Assessment of Young Children. (1987). Standardized testing of young children 3 through 8 years of age [Position statement]. Washington, DC: Author.
    National Association for the Education of Young Children. (2004). Where we stand, NAEYC and NAECS/SDE on curriculum, assessment, and program evaluation. Retrieved March 23, 2007, from
    National Association of Early Childhood Specialists in State Departments of Education. (2000). Assessment of young children. Retrieved June 20, 2005, from
    National Association of School Psychologists. (2003). Position statement on student grade retention and social promotion. Retrieved January 13, 2005, from
    National Association of School Psychologists. (2005). Position statement on ability grouping and tracking. Retrieved October 16, 2006, from
    National Board for Professional Teaching Standards. (1988) [First published in 1986]. National board for professional teaching standards. Washington, DC: ERIC Clearinghouse on Teacher Education. (ERIC Document No. ED304444, ERIC Digest # 88–6)
    National Center for Educational Statistics. (2004). Trends in international mathematics and science study. Washington, DC: Author. Retrieved on November 2, 2006, from
    National Center for Educational Statistics. (2007). America's high school graduates: Results from the 2005 NAEP High School Transcript Study [Commissioner's remarks]. Retrieved February 23, 2007, from
    National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. Washington, DC: Superintendent of Documents, U.S. Government Printing Office.
    National Council of Teachers of Mathematics. (1991). Professional standards for teaching mathematics. Reston, VA: Author.
    National Council of Teachers of Mathematics. (2003, October). The use of technology in the learning of mathematics [NCTM position paper]. Reston, VA: Author. Retrieved July 21, 2005, from
    National Council of Teachers of Mathematics. (2006). Computation, calculators, and common sense [Position statement]. Reston, VA: Author. Retrieved April, 3, 2006, from
    National Education Association. (2003). Status of the American public school teacher 2000–2001. Washington, DC: Author.
    National Education Association. (2004, February). Does the NCLB provide good choices for students in low-performing schools?Cambridge, MA: Harvard Civil Rights Project. Retrieved June 25, 2007, from
    National Education Association. (2007). Professional pay. Washington, D.C.: National Education Association, Author. Retrieved September 7, 2007, from
    National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific literature on reading and its implications for reading instruction: Reports of the subgroups (NIH Publication No. 00–4754). Washington, DC: U.S. Government Printing Office.
    National Merit Scholarship Program. (2004). Retrieved February 3, 2005, from
    National Partnership for Teaching At-Risk Schools. (2005). Qualified teachers for at-risk schools: A national imperative [Initial report]. Washington, DC: Author.
    National Reading Panel. (2000, April). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (National Institute of Health Publication No. 00–4769). Rockville, MD: National Institute of Health, National Institute of Child Health and Human Development.
    National Register of Historic Places. (n.d.). Welcome to the National Register of Historic Places. Retrieved April 3, 2006, from
    New Jersey School Boards Association. (2003). Grading students. School Leader (Policy update). Trenton, NJ: Author. Retrieved May 31, 2005, from
    New York State Education Department. (1987). History of Regents Examinations 1865 to 1987. Retrieved March 16, 2007, from
    New York United Federation of Teachers (2004, July 13). “New” new report card lauded. Education World: The Educators Best Friend. Retrieved July 13, 2004, from
    Nichols, S. L., Glass, G. V., & Berliner, D. C. (2005, September). High-stakes testing and student achievement: Problems for the No Child Left Behind Act (Report EPSL-0509–105-EPRU). Tempe, AZ: Educational Policy Studies Laboratory, Arizona State University.
    Niguidula, D. (2005, November). Documenting learning with digital portfolios. Educational Leadership, 63(3), 44–47.
    Nitko, A. J. (1996). Educational assessment of students. Englewood Cliffs, NJ: Merrill, Prentice Hall.
    No Child Left Behind Act (ESEA) (P.L. 107–110, 2002).
    Noble, J. P., Davenport, M., & Sawyer, R. (2001, April). Relationships between noncognitive characteristics, high school course work and grades, and performance on a college admission test. Paper presented during the annual meeting of the American Educational Research Association, Seattle, WA.
    Noble, J. P., Roberts, W L., & Sawyer, R. L. (2006, April). Student achievement, behavior, perceptions, and other factors affecting ACT scores [ACT Research Report Series, October 2006]. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Noguera, P. A. (2003). How race identity affects school performance. Cambridge, MA: Harvard Education Letter.
    North Central Regional Educational Laboratory (1993). Integrating community services for young children and their families. Retrieved June 5, 2005, from
    Northwest Regional Educational Laboratory. (1989). North Carolina end of grade testing program booklet. Retrieved April 29, 2005, from
    Nye, B., Hedges, L. V., & Konstantopoulos, S. (2000). The effects of small classes on academic achievement: The results of the Tennessee class size experiment. American Educational Research Journal, 37(1), 123–151.
    Obey-Porter Act (Elementary and Secondary Education Act [ESEA]), P.L. 105–78, 1998 § Title X, Sec. C.
    O'Connor, J. J., & Robertson, E. F. (2003). Karl Pearson. Archive document, School of Mathematics and Statistics, University of St. Andrews, Scotland. Retrieved April 16, 2005, from
    O'Donovan, E. (2007). Making individualized education programs manageable for parents. District Administration, 43(7), 69.
    Oh, H., & Sathy, V. (2007, April). Construct comparability and continuity in the SAT. Paper presented during the annual meeting of the national Council on Measurement in Education, Chicago, IL.
    Ohio State Board of Education. (2006, April 10). Report on the first year assessment of kindergarten readiness, literacy. Retrieved May 10, 2006, from
    Olson, L. (2003). Standards and tests: Keeping them aligned. Research Points, 1(1), 1–4.
    Olson, L. (2004, December 1). NCLB law bestows bounty on test industry. Education Week, pp. 1, 18, 19.
    Olson, L. (2006, August 30). Number of graduation exams required by states levels off. Education Week, 28(1), p. 28.
    Olson, L. (2006, December 6). U.S. urged to rethink NCLB “tools.”Education Week, 26(14), pp. 1, 19.
    Organization for Economic Co-Operation and Development. (2005, June). Teachers matter: Attracting, developing and retaining effective teachers. Paris, France: Author. Retrieved July 16, 2005, from
    Organization for Economic Co-Operation and Development (2006, May 15). OECD education systems leave many immigrant children floundering, report shows. Brussels, Belgium: Directorate of Education, OECD. Retrieved November 18, 2006, from,2744,en_2649_201185_36701777_l_l_l_l,00.html
    Ornstein, A. C., & Levine, D. U. (1989). Foundations of education (
    4th ed.
    ). Boston: Houghton Mifflin.
    Otis, A. (1925). Statistical method in educational measurement. New York: World Book.
    Owens, A., & Sunderman, G. L. (2006, October). School accountability under NCLB: Aid or obstacle for measuring racial equity? [Policy Brief]. Cambridge, MA: The Civil Rights Project, Harvard University. Retrieved November 28, 2006, from
    Packer, J. (2006, January 19). More schools are failing NCLB law's “adequate yearly progress” requirements: Emerging trends under the law's annual rating system. Washington, DC: National Education Association. Retrieved April 5, 2007, from
    Pakkala, T. (2006, May 31). Does it pay to reward students for success?Gainsville Sun. Retrieved June 5, 2006, from
    Partin, R. L. (2005). Classroom teacher's survival guide: Practical strategies, management techniques, and reproducibles for new and experienced teachers (
    2nd ed.
    ). San Francisco: Jossey-Bass.
    Pascopella, A. (2007). Inside the law: Cheating on NCLB tests? Maybe. District Administration, 43(1), 20.
    Pascopella, A. (2007). The dropout crisis. District Administration, 43(1), 30–36, 38.
    PASE v. Hannon, 506 F. Supp. 831 (Northern District of Illinois, 1980).
    Patrick, K., & Eichel, L. (2006, June 25). Education tests: Who's minding the scores?Philadelphia Inquirer. Retrieved June 26, 2006, from
    Paulson, A. (2005, May 23). Need a tutor? Call India. Christian Science Monitor. Retrieved March 30, 2006, from
    Pellegrino, J. (2007). Should NAEP performance standards be used for setting standards for state assessments?Phi Delta Kappan, 88(7), 539–541.
    Perez, M., & Ines, Y (2004). Validation of the Spanish version of the Behavior assessment system for children: Parent rating scale for children (6–11) in Puerto Rico. Unpublished doctoral dissertation, Temple University.
    Peske, H. G., & Haycock, K. (2006, June). Teaching inequality: How poor and minority students are shortchanged on teacher quality [A Report and Recommendation by the Education Trust]. Washington, DC: The Education Trust. Retrieved June 18, 2006, from
    Phelps, R. (2005) The source of Lake Wobegon. Third Education Group Review, 1(2). Retrieved October 13, 2006, from
    Philadelphia Architects and Buildings Project. (2003). David Foy Combined Secondary and Primary School. Philadelphia: Author. Retrieved July 21, 2005, from
    Phillips, G. W (2007). Expressing international educational achievement in terms of U.S. performance standards: Linking NAEP achievement levels to TIMSS. Washington, DC: American Institutes for Research.
    Phillips, S. E. (2005, June). Legal corner: Reconciling IDEA and NCLB. NCME Newsletter, 13(2), 2–3.
    Piaget, J. (1930). The child's conception of physical causality (M.Gabian, Trans.). New York: Harcourt Brace.
    Piaget, J. (1964). The child's conception of number (C.Gattegno & F. M.Hodgson, Trans.). London: Routledge & Paul. (Original work published in Switzerland in 1941; first English translation in 1952)
    Pianta, R. C. (2007, March). Opportunities to learn in America's elementary classrooms. Science, 315, 1795–1796.
    Picard, C. J. (2004, August 18). [Press release]. Office of the State Secretary of Education. Baton Rouge, LA. Retrieved August 21, 2005, from
    Plessen, K.J., Bansal, R., Zhu, R., Amat, J., Quackenbush, G. A., Martin, L., et al. (2006, July). Hippocampus and amygdale morphology in attention-deficit/hyperactivity disorder. Archives of General Psychiatry, 63(7), 795–807.
    Pope, J. (2006, May 12). Student fatigue may explain drop in SATs [Associated Press]. Boston Globe. Retrieved May 13, 2006, from
    Popham, W. J. (1973). Found: A practical procedure to appraise teacher achievement in the classroom. In A. C.Ornstein (Ed.), Accountability for teachers and school administrator (pp. 25–27). Belmont, CA: Fearon.
    Popham, W J. (1990). Modern educational measurement: A practitioner's perspective (
    2nd ed.
    ). Englewood Cliffs, NJ: Prentice Hall.
    Popham, W. J. (1999). Classroom assessment: What teachers need to know (
    2nd ed.
    ). Boston: Allyn and Bacon.
    Popham, W. J. (2000, December). The mismeasurement of educational quality. School Administrator [Electronic version]. Retrieved September 21, 2004, from
    Popham, W. J. (2005, May). All about accountability/NAEP: Gold standard or fool's gold?Educational Leadership, 62(8), 79–81.
    Popham, W J. (2006, April). Branded by a test. Educational Leadership, 63(7), 86–87.
    Popielarski, J. (1998). Characteristics, background, and teaching methodologies of advanced placement U.S. history teachers. Unpublished doctoral dissertation, Widener University.
    Posner, D. (2004, June). What's wrong with teaching to the test?Phi Delta Kappan, 85(10), 749–751.
    Powers, D. E., Burstein, J. C., Chodorow, M., Fowles, M. E., & Kukich, K. (2001, March). Stumping E-Rater: Challenging the validity of automated essay scoring (Report No. 98–08bP). Princeton, NJ: Educational Testing Service.
    Pressey, S. L., & Pressey, L. C. (1923). Introduction to the use of standard tests: A brief manual in the use of tests of both ability and achievement in the school subjects. Yonkers, NY: World Book.
    Psychological Corporation. (2001). Wechsler individual achievement test-second edition. San Antonio, TX: Author.
    Public Education Network. (2005). Open to the public: Speaking out on “No Child Left Behind.” Retrieved June 23, 2005, from
    Rabb, T. K. (2005). Crisis in history: A statement. National Council for History Education. Retrieved June 22, 2005, from
    Rachmil, M. (Producer), & Rosman, M. (Director). (2004). A Cinderella story [Motion picture]. United States: Warner Bros.
    Rado, D., & Dell'Angela, T. (2005, June 14). Reading test may get easier to pass. Chicago Tribune [Electronic version]. Retrieved June 16, 2005, from,l,7l63850.story?
    Rahman, T. (2007). Mapping 2005 state proficiency standards onto the NAEP scales [Research and development report]. Washington, DC: National Center for Educational Statistics, U.S. Department of Education.
    Rakes, G. C. (2005–2006). The effect of open book testing on student performance in online learning environments [Research by Project RITE grant]. Martin, TN: University of Tennessee. Retrieved October 28, 2006, from
    Ramos, I. (1996, April). Gender differences in risk-taking behavior and their relationship to SAT-mathematics performance. Find Articles, Look Smart. Retrieved October 26, 2006, from
    Rasch, G. (1960/1980). An individualistic approach to item analysis. In P. F.Lazarsfeld and N. W.Henry (Eds.), Readings in mathematical social science (pp. 89–107). Chicago: Science Research Associates.
    Ravitch, D. (1995). National standards in American education: A citizen's guide. Washington, DC: The Brookings Institution Press.
    Ravitch, D. (2000). Left back: A century of battles over school reform. New York: Touchstone Books of Simon & Schuster.
    Raymond, M., & Fletcher, S. H. (2002, August). Teach for America [Research report]. Palo Alto, CA: CREDO Group, the Hoover Institution. Retrieved July 10, 2005, from
    Reardon, S. F., & Galindo, C. (2002). Do high-states tests affect students' decisions to drop out of school? Evidence from NELS (Working Paper 03–01). State College, PA: Population Research Institute, The Pennsylvania State University.
    Reckase, M. D. (1995). Portfolio assessment: A theoretical estimate of sco reliability. Measurement: Issues and Practice, 14(1), 12–14, 21.
    Reed, J. B. (2004, October, 29). Smart kids may be the ones left behind. News Press, pp. B1-B2.
    Rees, N. S. (2003, October 20). No Child Left Behind's education choice provisions: Are states and school districts giving parents the information they need? Testimony S. Rees before the U.S. House of Nina of Representatives. Retrieved June 24, 2005, from
    Reeves, P. (2005, February 12). Internet tutors from India aid U.S. kids with math. NPR [Radio broadcast]. Retrieved June 23, 2005, from
    Rehabilitation Act, 1973 (P.L. 93–112 [87 Stat.355] § 504).
    Reid, K. (2006, October 30). Parents, teachers confused over new report cards. Stockton Record. Retrieved November 5, 2006, from
    Reid, K. D., Hersko, W P., & Hamill, D. D. (1981/2001). Test of early reading ability, third edition (TERA-3). Austin, TX: Pro-Ed.
    Renchler, R. (1992). Student motivation, school culture, and academic achievement: What school leaders can do. ERIC/CEM Trends and Issues Series, Number 7 (ERIC Document No. EA 023 593).
    Renzulli, J. S., & Park, S. (2002). Giftedness and high school dropouts: Personal, family, and school-related factors (RM02168). Storrs, CT: The National Research Center on the Gifted and Talented, University of Connecticut. Retrieved November 5, 2006, from
    Resnick, L. B. (Ed.) (2004). Teachers matter; Evidence from value-added assessments. Research Points: Essential Information for Educational Policy, 2(4), 1–4.
    Rex, S. L. (2003). Reading strategies as a predictor of student scores on the Pennsylvania System of School Assessment Reading Exam at the eleventh-grade level. Unpublished doctoral dissertation, Widener University.
    Reynolds, C., & Kamphaus, R. (2004). Behavior assessment system for children, third edition. Circle Pines, MN: American Guidance Service, Pearson Education.
    Richman, S. (2001, April). Parent power: Why national standards won't improve education (Policy Analysis No. 396). Washington, DC: The Cato Institute. Retrieved June 26, 2005, from
    Ritchie, S. (2004). Horace Mann. Dictionary of Unitarian & Universalist Biography. Retrieved June 21, 2005, from
    Rivers-Sanders, J. C. (1999). The impact of teacher effect on student math competency achievement. Doctoral dissertation, University of Tennessee. ProQuest document ID 730840811.
    Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3–13.
    Roediger, III, H. L., & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1(3), 181–208.
    Rogers, J., Holme, J. J., & Silver, D. (2005). More questions than answers: CAHSEE results, opportunity to learn, and the class of 2006. Los Angeles: UCLA Institute for Democracy, Education, and Access. Retrieved August 25, 2005, from
    Roid, G. H. (2003). Stanford-Binet intelligence scales, fifth edition. Itasca, IL: Riverside.
    Rojstaczer, S. (2003). Grade inflation at American colleges and universities. Retrieved June 5, 2005, from
    Romanowski, M. K. (2004). Student obsession with grades and achievement. Kappa Delta Pi Record, 40(4), 149–151.
    Roschewski, P., Isernhagen, J., & Dappen, L. (2006). Nebraska STARS: Achieving results. Phi Delta Kappan, 87(6), 433–437.
    Rose, L. C., & Gallup, A. C. (2005, September). Pie Delta Kappa/Gallup Poll of the public's attitudes toward the public schools. Phi Delta Kappan, 87(1), 41–57.
    Roseberry-McKibbin, C., & Brice, A. (2005). Acquiring English as a second language: What's normal, what's not. Rockville, MD: American Speech-Language-Hearing Association. Retrieved March 30, 2006, from
    Rosen, C. L. (1985). [Test review of the Creativity assessment packet]. From J. VMitchell Jr. (Ed.), The ninth mental measurements yearbook [Electronic version]. Retrieved February 14, 2005, from the Buros Institute's Test Reviews Online Web site:
    Rosenberg, S. (2005, November 5). Teachers chip in as budgets shrink. Boston Globe. Retrieved September 16, 2006, from
    Ross, A. (2007, April). The effects of constructivist teaching approaches on middle school student's algebraic procedural and conceptual understanding. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Ross, R. P. (1992). Accuracy in analysis of discrepancy scores: A nationwide study of school psychologists. School Psychology Review, 21(3), 480–493.
    Rotherham, A. J. (2006). Making the cut: How states set passing scores on standardized tests. Washington, DC: Education Sector. Retrieved October 25, 2005, from
    Rothstein, R., & Jacobsen, R. (2006). The goals of education. Phi Delta Kappan, 88(4), 264–272.
    Rothstein, R., Jacobsen, R., & Wilder, T. (2006, November 29). “Proficiency for all” is an oxymoron. Education Week, 26(13), pp. 44, 32.
    Rowlings, J. K. (2003). Harry Potter and the order of the phoenix. New York: Arthur A. Levine Books of Scholastic Press.
    Rudin, S., & Gale, D. (Producers), & Kasdan, J. (Director). (2002). Orange County [Motion picture]. United States: Paramount Pictures.
    Rumberger, R. W., Gándara, P., & Merino, B. (2006, Winter). Where California's English learners attend school and why it matters. University of California Linguistic Minority Research Institute Newsletter, 15(2), 1–3.
    Rumberger, R. W., Larson, K. A., Ream, R. K., & Palardy, G. J. (1999). The educational consequences of mobility for California students and schools (PACE Research Series 99–2). Berkeley, CA: Policy Analysis for California Education, University of California, and Stanford University.
    Rural School and Community Trust. (2007, March). Nebraska STARS provides assessment can inspire loyalty [Electronic version]. Rural Policy Matters, 9(3). Retrieved March 21, 2007, from{F777430F-87F2-4186-B6A9-9FD3AE73191F}&notc=1&c=t
    Russell, J., & LaCoste-Caputo, J. (2006, December 4). More kids repeating kindergarten. San-Antonio News Express. Retrieved December 6, 2006, from
    Ryman, A. (2005, January 6). Parents can check children's grades, homework online. Arizona Republic. Retrieved January 7, 2005, from
    Ryser, G., & McConnell, K. (2002). Scales for diagnosing attention-deficit/hyperactivity disorder. Austin, TX: Pro-Ed.
    Sacchetti, M. (2004, December 9). Report cards remake the grade. Boston Globe. Retrieved November 5, 2006, from
    Sacchetti, M. (2005, December 19). Advanced classes see dip in diversity. Program prepares for exam schools. Boston Globe. Retrieved December 21, 2005, from
    Sadker, D., & Zittleman, K. (2004). Test anxiety: Are students failing tests or are tests failing students?Phi Delta Kappan, 85(10), 740–744, 751.
    Salvia, J., Ysseldyke, J. E., & Bolt, S. (2007). Assessment in special and inclusive education (
    10th ed.
    ). New York: Houghton Mifflin.
    Samuels, C. A. (2007, April 11). States seen renewing focus on education of gifted. Education Week, 26(32), pp. 20, 23.
    Sanbonmatsu, L., Kling, J. R., Duncan, G. J., & Brooks-Gunn, J. (2005). Neighborhoods and academic achievement: Results from the moving to opportunity experiment. Cambridge, MA: National Bureau of Economic Research. Retrieved March 31, 2006, from
    Sanders, L. (n. d.). Accountability mechanisms and processes: “Value added”: Telling the truth about schools' performance. London: National Foundation for Educational Research in England and Wales. Retrieved April 22, 2006, from
    Sanders, W L. (1998). Value-added assessment. School Administrator [Electronic version]. Retrieved August 21, 2005, from
    Sanders, W L., Ashton, J. J., & Wright, S. P. (2005). Comparison of the effects of NBPTS certified teachers with other teachers on the rate of student academic progress (Final Report). Washington, DC: National Board for Professional Teaching Standards.
    Sanders, W L., & Horn, S. P. (1998). Research findings from the Tennessee value-added assessment system (TVASS) database: Implications for educational evaluation and research. Journal of Personnel Evaluation in Education, 12(3), 247–256.
    Sanders, W L., & Rivers, J. C. (1996). Cumulative and residual effects of teachers on future student academic achievement [Research progress report]. Knoxville, TN: Value Added Assessment Center, University of Tennessee.
    Sanders, W L., Saxton, A. M., & Horn, S. P. (1997). The Tennessee value-added assessment system, a quantitative, outcomes-based approach to educational measurement. In JasonMillman (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? (pp. 137–162). Thousand Oaks, CA: Corwin Press.
    Sandholtz, J. H., Ogawa, R. T., & Scribner, S. P. (2004). Standards gaps: Unintended consequences of local standards-based reform. Teachers College Record, 106(6), 1177–1202.
    Sattler, J. M., & Dumont, R. (2004). Assessment of children: WISC-IV and WPPSI-III supplement. La Mesa, CA: Jerome M. Sattler.
    Saulny, S. (2005, December 27-b). In middle class, signs of anxiety on school efforts. New York Times. Retrieved December 28, 2005, from
    Saunders, D. L. (2007, March 4). Higher grades, lower scores. San Francisco Chronicle. Retrieved March 8, 2007, from
    Sax, G. (1989). Principles of educational and psychological measurement and evaluation (
    3rd ed.
    ). Belmont, CA: Wadsworth.
    Sax, G. (1997). Principles of educational and psychological measurement and evaluation (
    4th ed.
    ). Belmont, CA: Wadsworth.
    Scarborough, A. A., Spiker, D., Mallik, S., Hebbeler, K. M., Bailey, D. B., & Simeonsson, R.J. (2004). A national look at children and families entering early intervention. Exceptional Children, 70(4), 469–483.
    Schafer, W D., Gangé, P., & Lissitz, R. W. (2005). Resistance to confounding style and content in scoring constructed-response items. Educational Measurement: Issues and Practice, 24(2), 22–28.
    Schemo, D. J. (2007, March 26). Failing schools see a solution in longer day. New York Times, pp. 1, 18.
    Schmidt, P. (2007, February 23-b). Regent's diversity vote means trouble for U. of Wisconsin. Chronicle of Higher Education, 53(25), pp. A17–A18.
    Schooler, C., Mesfin, S. M., & Oates, G. (1999). The continuing effects of substantively complex work on the intellectual functioning of older workers. Psychology and Aging, 14(3), 483–506.
    Schultz, E. M. (2006). Commentary: A response to Reckase's conceptual framework and examples for evaluating standard setting methods. Educational Measurement: Issues and practice, 25(3), 4–13.
    Schworm, P. (2004, July 12). MCAS detour proves tough. Boston Globe. Retrieved August 13, 2004, from
    Schworm, P. (2005, April 3). War on words: In class, grammar rears its ugly head. Boston Globe. Retrieved April 5, 2005, from
    Scott, C. (2007). Now what? Lessons from Michigan about restructuring schools and next steps under NCLB. Washington, DC: Center on Education Policy. Retrieved March 28, 2007, from
    Scriven, M. (1997). Student ratings offer useful input to teacher evaluations, ERIC Digest. Retrieved April 4, 2006, from
    Semrud-Clickeman, M. (2003). Phonological processing, automaticity auditory processing and memory in slow learners and children with reading disabilities. Unpublished doctoral dissertation, University of Texas. ProQuest document identification number 765200031.
    Shamberg, M., & Cantillon, E. (Producers), & Holland, S. S. (Director). (1989). How I got into college [Motion picture]. United States: Twentieth Century Fox.
    Shanahan, T. (2005). [Test review of the DIBELS: Dynamic indicators of early literacy skills, sixth edition). From R. A.Spies & B. S.Plake (Eds.), The sixteenth mental measurements yearbook [Electronic version]. Retrieved July 2, 2005, from the Buros Institute's Test Reviews Online Web site:
    Shankar, J. (2005, August 29). Tutoring U.S. math students adds new twist to Indian outsourcing saga. Middle East Times. Retrieved September 1, 2005, from
    Shapira, I. (2006, November 21). Those who pass classes but fail tests cry foul. Washington Post. Retrieved November 21, 2006, from
    Shaw, P., Lerch, J., Greenstein, D., Sharp, W., Ciasen, L., Evans, A., et al., (2006, May). Longitudinal mapping of cortical thickness and clinical outcome in children and adolescents with attention-deficit/hyperactivity disorder. Archives of General Psychiatry, 63(5), 540–509.
    Shaw, S. R., & Gouwens, D. A. (2002). Chasing and catching slow learners in changing times [Electronic version]. NASP Communiqué, 31(4). Retrieved January 13, 2005, from
    Shaywitz, B. A., Shaywitz, S. E., Pugh, K. R., Mencl, W E., Fulbright, R. K., Skudlarski, P., et al. (2002). Disruption of posterior brain systems for reading in children with developmental dyslexia. Biological Psychiatry, 52(2), 101–110.
    Shaywitz, S. E., & Shaywitz, B. A. (2005). Dyslexia (specific reading disability). Biological Psychiatry, 57, 1301–1309.
    Shaywitz, S. E., & Shaywitz, B. A. (2007). What neuroscience really tells us about reading instruction. Educational leadership, 64(5), 74–76.
    Shedd, J. (2003). The history of the student credit hour. Unpublished manuscript, Office of Institutional Research and Planning, University of Maryland. Retrieved March 19, 2007, from
    Shepard, L. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 1–14.
    Shields, R. N. (2000). Writing strategies as predictors of student scores on the Pennsylvania System of School Assessment Writing Test. Unpublished dissertation, Widener University.
    Shurkin, J. N. (1992). Terman's kids: The groundbreaking study of how the gifted grow up. Boston: Little, Brown.
    Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences (
    2nd ed.
    ). New York: McGraw-Hill.
    Silverlake, A. C. (1999). Comprehending test manuals: A guide and workbook. Los Angeles: Pyrczak.
    Simon, R. (2004, May 19). Nebraska Assessment Letter #3. [Letter to Commissioner From the U.S. Department of Education]. Retrieved September 22, 2007, from
    Sireci, S. G. (2004, April). How psychometricians can help reduce the achievement gap: Or can they? Paper presented at a symposium, The Achievement Gap: Test Bias or School Structures? During the annual meeting of the National Association of Test Directors, San Diego, CA. Retrieved July 3, 2005, from
    Sireci, S. G., & Parker, P. (2007). Validity on trial: Psychometric and legal conceptualizations of validity. Educational Measurement: Issues and Practice, 25(3), 27–34.
    Sizer, T. R. (2004). Preamble: A reminder for Americans. In D.Meier & G.Wood (Eds.), Many children left behind: How the No Child Left Behind Act is damaging our children and our schools (pp. xvii–xxii). Boston: Beacon Press.
    Skinner, R. A. (2005, January 6). State of the states. Education Week. Retrieved June 18, 2005, from
    Slavin, R. E. (1994). Educational psychology: Theory and practice. Boston: Allyn & Bacon.
    Smallwood, S. (2005, January 14). Faculty group censures Benedict College again. This time over ‘A for effort” policy. Chronicle of Higher Education. Retrieved January 14, 2005, from
    Smart Heilshorn, K. (2003). Calculator usage as a predictor of student success on the Pennsylvania System of School Assessment grade eight mathematics test. Unpublished doctoral dissertation, Widener University.
    Smith, B. (1998). It's about time: Opportunities to learn. Chicago: Consortium on Chicago School Research at the University of Chicago. Retrieved July 19, 2004, from
    Smith, D. (2004, November 28). Homework disparity points to education gap. Kansas City Star. Retrieved November 30, 2004, from
    Smith, J. K. (2001). [Review of the Detroit tests of learning aptitude, fourth edition). In B. S.Plake & J. C.Impara (Eds.), The fourteenth mental measurements yearbook (pp. 382–386). Lincoln, NE: Buros Institute on Mental Measurements.
    Snipes, J., Williams, A., Horwitz, A., Soga, K., & Casserly, M. (2007). Beating the odds: A city-by-city analysis of student performance and achievement gaps on state assessments. Washington, DC: Council of the Great City Schools. Retrieved September 21, 2007, from
    Solochek, J. (2006, October 20). A test beyond the norm. St. Petersburg Times. Retrieved October 24, 2006, from
    Spearman, C. (1904). General intelligence, objectively determined and measured. American Journal of Psychology, 15, 201–293.
    Spearman, C. (1939). Determination of factors. British Journal of Psychology, 30, 78–83.
    Spellings, M. (2006, April 6). Secretary Spellings' prepared testimony before the House Committee on Education and the Workforce [Press release]. U. S. Department of Education. Retrieved April 27, 2006, from
    Spellings, M. (2006, July 27). Building partnerships to help English language learners. Washington, DC: U.S. Department of Education. Retrieved November 13, 2006, from
    Steadman, S. C., & Simmons, J. S. (2007). The cost of mentoring non-university-certified teachers who pays the price. Phi Delta Kappan, 88(5), 364–367.
    Steele, C. M. (1997). A threat in the air: How stereotypes shape intellectual identity and performance. American Psychologist, 52, 613–629.
    Steele, C. M. (1999, August). Thin ice: “Stereotype threat” and black college students. Atlantic, XX, 44–54.
    Steinberg, J. (2002). The gatekeepers: Inside the admissions process of a premier college. New York: Penguin.
    Steinberg, L. (1996). Beyond the classroom: Why school reform has failed and what parents need to do. New York: Touchstone.
    Stephens, S. (2006, May 7). Exam proves what teachers know. Cleveland Plain Dealer, p. B1
    Stern, W. L. (1928). Intelligenz der kinder und jugendlichen und die methoden ihrer Untersuchung. Leipzig, Germany: Verlag von Johann Ambrosius Barth.
    Sternberg, R. J. (1999). Intelligence as developing expertise. Contemporary Educational Psychology, 24, 359–375.
    Sternberg, R. J. (2007, July). Finding students who are wise, practical, and creative. Chronicle of Higher Education, 53 (44), pp. B11–12.
    Sternberg, R. J., Grigorenko, E. L., & Kidd, K. K. (2005). Intelligence, race, and genetics. American Psychologist, 60(1), 46–59.
    Sternberg, R. J., Wagner, R. K., Williams, W M., & Horvath, J. A. (1995). Testing common sense. American Psychologist, 50(11), 912–927.
    Stevens, N. (1993). Perceived mentor outcomes from the mentoring experience in a formal teacher induction program. Unpublished doctoral dissertation, Widener University.
    Stevenson, J., et al. (2007, September 6). Food additives and hyperactive behaviour in 3-year-old and 8/9 year-old children in the community: A randomized, double-blinded, placebo-controlled trial. Lancet. Retrieved September 8, 2007, from
    Stiggins, R. J. (2002). Assessment crisis: The absence of assessment for learning. Phi Delta Kappan, 83(10), 758–765.
    Stiggins, R. J. (2004). New assessment beliefs for a new school mission. Phi Delta Kappan, 86(1), 22–21.
    Stiggins, R. J., & Bridgeford, N. J. (1985). The ecology of classroom assessment. Journal of Educational Measurement, 22(4), 271–286.
    Stiggins, R. J., Frisbie, D. A., & Griswold, P. A. (1989). Inside high school grading practices: Building a research agenda. Educational Measurement: Issues and Practice, 8(2), 5–14.
    Stoneberg, Jr., B. D. (2004, March). A study of gender-based and ethnic-based differential item functioning (DIE) in the spring 2003 Idaho Standards Achievement Tests applying the simultaneous bias test (SIBTEST) and the Mantel-Haenszel Chi Square Test. Unpublished manuscript, University of Maryland, Measurement, Statistics, and Evaluation Department. Retrieved November 8, 2006, from
    Stoskepf, A. (1999). The forgotten history of eugenics [Electronic version]. Rethinking Schools, 13(3). Retrieved March 22, 2007, from
    Strauss, V (2006, March 21). Putting parents in their place: Outside class. Washington Post. Retrieved March 22, 2006, from
    Stufflebeam, D. L. (1981). A brief introduction to standards for evaluations of educational programs, projects, and materials. Evaluation News, 2(2), 141–145.
    Summa cum lawsuit (2004, September/October). Legal Affairs. Retrieved November 5, 2006, from
    Sunderman, G. L., & Kim, J. (2005, May 5). Teacher quality: Equalizing educational opportunities and outcomes. Cambridge, MA: The Civil Rights Project, Harvard University. Retrieved July 12, 2005, from
    Sunderman, G. L., Kim, J. S., & Orfield, G. (2005). NCLB meets school realities: Lessons from the field. Thousand Oaks, CA: Corwin Press.
    Suter, W N. (2006). Introduction to educational research: A critical thinking approach. Thousand Oaks, CA: Sage.
    Suzuki, T., Swuz, D. (Producers), & Sheetz, C. (Director). (2001). Recess: School's out [Motion picture]. United States: The Walt Disney Company.
    Talbot, M. (2005, June 6). Best in class. New Yorker, 81(16), 38–43.
    Tatsuoka, M. M., & Lohnes, P. R. (1988). Multivariate analysis: Techniques for educational and psychological research. New York: Macmillan.
    Tellez, K., & Waxman, H. (n.d.). Effective community programs for English language learners (Draft report). Santa Cruz, CA: University of Santa Cruz. Retrieved November 26, 2006, from
    Terman, L. M. (1916). Stanford revision and extension of the Binet-Simon scale. Retrieved February 1, 2005, from
    Testing, assessment, and evaluation to improve learning in our schools (1990). Oversight hearing before the Subcommittee on Elementary, Secondary, and Vocational Education of the Committee on Education and Labor, U.S. House of Representatives, 101st Congress, Second Session.
    Texas Assessment of Skills and Knowledge (n.d.). Student Assessment Division. Retrieved April 5, 2006, from
    Thernstrom, A., & Thernstrom, S. (2003). No excuses: Closing the racial gap in learning. New York: Simon and Schuster.
    Thiers, N. (2005). Supporting new teachers [Electronic version]. Educational leadership, 62(8). Retrieved December 11, 2005, from
    Thomas, D., & Bainbridge, W. (1997, January). Grade inflation: The current fraud. Effective School Research. Retrieved July 16, 2004, from
    Thorndike, E. L., Cobb, M. V, & Bergman, E. O. (1927). The measurement of intelligence. New York: Teachers College Press.
    Thurlow, M. L., & Bolt, S. (2001). Empirical support for accommodations most often allowed in state policy (Synthesis Report 41). Minneapolis, MN: University of Minnesota, National Center for Educational Outcomes. Retrieved November 17, 2006, from
    Thurlow, M. L., Lazarus, S., Thompson, S., & Robey, J. (2002). 2001 state policies on assessment participation and accommodations (Synthesis Report 43). Minneapolis: National Center of Educational Outcomes, University of Minnesota.
    Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 3, 273–286.
    Thurstone, L. L. (1938). Primary mental abilities (Psychometric Monograph No.1). Chicago: University of Chicago Press.
    Thurstone, L. L. (1947). Multiple-factor analysis: A development and expansion of the vectors of the mind. Chicago: University of Chicago Press.
    Tierney, J. (2004, November 21). When every child is good enough. New York Times. Retrieved November 22, 2004, from
    Toch, T. (2006). Margins of error: The education testing industry in the No Child Left Behind era (Education Sector Report). Washington, DC: Education Sector.
    Tonn, J. L. (2007, February 7). Houston in uproar over teacher's bonuses. Education Week, 26(22), pp. 5, 13.
    Torrance, E. P. (1966) Torrance tests of creativity. Princeton, NJ: Personnel Press.
    Torrance, E. P., Goff, K., & Satterfield, N. B. (1998). Multicultural mentoring of the gifted and talented. Waco, TX: Prafrock Press.
    Torrance, E. P., & Myers, R. E. (1970). Creative learning and teaching. New York: Harper Collins.
    Torrance, E. P., Safter, T. H., & Ball, O. E. (1992). Torrance tests of creative thinking streamlined scoring guide: Figural A and B. Bensenville, IL: Scholastic Testing Service.
    Torres, S., Santos, J., Peck, L., & Cortes, L. (2004). Minority teacher recruitment, development, and retention. Providence, RI: The Education Alliance at Brown University.
    Tracey, C. A., Sunderman, G. L., & Orfield, G. (2005, June). Changing NCLB district accountability standards: Implications for racial equality. Cambridge, MA: The Civil Rights Project, Harvard University. Retrieved July 14, 2005, from
    Trauwein, I. W., Lüdke, U. O., & Baumert, J. (2007). The big-fish-little-pond effect: persistent negative effects of selective high schools on self-concept after graduation. American Educational Research Journal, 44(3), 631–669.
    Treffinger, D. J. (1985). Test review of the Torrance tests of creative thinking]. From J. VMitchell Jr. (Ed.), The ninth mental measurements yearbook [Electronic version]. Retrieved February 14, 2005, from the Buros Institute's Test Reviews Online Web site:
    Trotter, A. (2007, August 29). Poll finds rise in unfavorable views of NCLB. Education Week, 27(1), p. 10.
    Tucker, A., (2004). The New York Regents math test problems [Report of the New York Regents Math Panel]. State University of New York, Stony Brook. Retrieved April 5, 2005, from
    Tuckman, B. W. (2003). The effect of learning and motivation strategies training on college students' achievement. Journal of College Student Development, 4, 430–437.
    Twarog, M. A. (1999). Experimental study of individual versus blanket-type homework assignments in elementary school mathematics with a computerized component. Unpublished doctoral dissertation, Widener University.
    University College London. (2007, March 23). Finding math hard? Blame your right parietal lobe. Science Daily. London: Author. Retrieved March 29, 2007, from
    U.S. Department of Education. (2005). Assistance to states for the education of children with disabilities. Federal Register, 70(118), 35782.
    U.S. Department of Education, Office of Elementary and Secondary Education (2006). Teacher incentive fund. Federal Register, 71(83), 25580–25581.
    U.S. Department of Education, National Center for Education Statistics. (2004). Third International Mathematics and Science Study (TIMSS). Washington, DC: U.S. Government Printing Office.
    U.S. Department of Education, Office of the Secretary of Education (2007). Decision letters on each state's final assessment system under No Child Left Behind (NCLB). Retrieved March 16, 2007, from
    U.S. Military Entrance Processing Command. (1968, 1992). Armed service vocational aptitude battery [Forms 18/19]. North Chicago, IL: Author.
    Vait, A. (Producer), & Painter, M. (Director). (2004). Admissions [Motion picture]. United States: Hart Sharp Video.
    Van Moorlehem, T. (1998). Home sales, custody fights hinge on exam. Detroit Free Press. Retrieved July 15, 2005, from
    Vasluski, T., McKenzie, S., & Mulvenon, S. W (2005). Examining predictors of college remediation: The effects of high school grade inflation. Fayetteville, AK: University of Arkansas. Retrieved April 13, 2005, from
    Vedder, R. K. (2000). Can teachers own their own schools? New strategies for educational excellence. Oakland, CA: The Independent Institute.
    Vernon, P. E. (1961). The structure of human abilities (
    revised ed.
    ). London: Methuen & Co.
    Viadero, D. (2006, November 29). Potential of global tests seen as unrealized. Education Week, 26(13), pp. 1, 14, 15.
    Viadero, D. (2007, January 10). Study links merit pay to slightly higher student scores [Electronic version]. Education Week, 26(20). Retrieved January 26, 2007, from
    Viadero, D. (2007, January 24). “What Works” reviewers find no learning edge for leading math texts. Education Week, 28(20), pp. 1, 21.
    Viadero, D. (2007, June 13). Evidence thin on student gains from NCLB tutoring. Education Week, 26(41), p. 7
    Voltaire (1947). Candide, or optimism. (J.Butt, Trans). New York: Penguin Books. (Original work published in French in 1759)
    Vu, P. (2006, June 23). States do not narrow teacher equity gap. Philadelphia:, Pew Charitable Trusts. Retrieved June 24, 2006, from
    Walkovic, C. E. (2003). Reading strategies as predictors of school scores on the Pennsylvania System of School Assessment Reading Test. Unpublished doctoral dissertation, Widener University.
    Wallach, M. A., & Kogan, N. (1965). Modes of thinking in young children: A study of the creativity-intelligence distinction. New York: Holt, Reinhart, & Winston.
    Wallis, C. (2007, January 12). Is the Autism epidemic a myth?Time. Retrieved January 16, 2007, from,88l6,1576829,00.html
    Wang, J. & Lin, E. (2005). Comparative studies on U.S. and Chinese mathematics learning and the implications for standards-based mathematics teaching reform. Educational Researcher, 34(5), 3–13.
    Wang, S., Young, M. J., Brooks, T. E., & Jiao, H. (2007, April). A comparison of computer-automated and human scoring methods for a statewide writing assessment. Paper presented during the annual meeting of the American Educational Research Association, Chicago, IL.
    Wang, X. B. (2007). Investigating the effects of increased SAT Reasoning Test length and time on performance of regular SAT examinees [College Board Research Report No. 2006–9]. New York: The College Board.
    WasleyP. (2006, March 10). A new way to grade. Chronicle of Higher Education, 52(27), pp. A6, A8–9.
    Wasley, P. (2007, February 23). College Board reports more takers, and higher scores, for Advanced Placement tests. Education Week, 53(25), p. 18.
    Wasta, M. J. (2006). No Child Left Behind: The death of special education. Phi Delta Kappan, 88(4), 98–299.
    Watch for curves (2007, March 16). Chronicle for Higher Education, 53(28), p. A6.
    Way, W D. (2006, September). Precision and volatility in school accountability systems (Research Report RR-06–26). Princeton, NJ: Educational Testing Service. Retrieved November 10, 2006, from
    Webb, N. L. (2002, April). An analysis of the alignment between mathematics standards and assessments for three states. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
    Webb, N. L. (2005, April). Issues related to judging the alignment of curriculum standards and assessments. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
    Webb, N. L. (2007, September). Aligning assessments and standards. Newsletter of the Wisconsin Center for Education Research. Retrieved September 5, 2007, from
    Weber, C. (Producer), & Rosman, M. (Director). (2004). A Cinderella Story [Motion picture]. United States: Warner Bros.
    Wells, F. (2001). Zip codes shouldn't determine our students' future [Electronic version]. California Educator, 5(8). Retrieved July 8, 2005, from
    Wells, J. A. (2002). A correlational study of grade five studies and PSSA testing of reading in Pennsylvania. Unpublished doctoral dissertation, Widener University.
    Wessmann v. Gittens, 160 F. 3d. 790 (1st Cir. 1998).
    Wheelan, S. A., & Kesselring, J. (2005). Link between faculty group development and elementary student performance on standardized tests. Journal of Educational Research, 98(6), 323–330.
    Williams, F. E. (1986). The cognitive-affective intervention model for enriching gifted programs. In J. S.Renzulli (Ed.), Systems and models for developing programs for the gifted and talented. Mansfield Center, CT: Creative Learning Press.
    Williams, J. (2007, May 10). State board approves mandatory exit examination for special education. San Francisco Chronicle. Retrieved May 12, 2007, from
    Williams, R. (2006). The power of normalized word vectors for automatically grading essays. Issues in Informing Science and Information Technology, 3, 721–728.
    WilmsW W., & Chapleau, R. R. (1999, November 3). The illusion of paying teachers for student performance. Education Week, 19(10), pp. 34, 48.
    Winerip, M. (2003, September 24). On front lines, casualties are tied to new U.S. law. New York Times, p. B9.
    Winerip, M. (2004, May 28). On education: The changes unwelcome, a model teacher moves on. New York Times, p. B7.
    Witziers, B., Bosker, R. J., & Kruger, M. L. (2003). Educational leadership and student achievement: The elusive search for an association. Education Administration Quarterly, 39(3), 398–425.
    Wolf, R. M. (1990). Evaluation in education. Foundations of competency assessment and program review (
    3rd ed.
    ). New York: Praeger.
    Wolverton, B. (2006, February 16). NCAA panel proposes changes in eligibility requirements to combat academic fraud at unregulated high schools. Chronicle of Higher Education. Retrieved February 16, 2006, from
    Woodcock, R., McGrew, K. S., & Mather, N. (2001). Woodcocks-Johnson III—Tests of achievement. Itasca, IN: Riverside.
    Woodruff, D. J., & Ziomek, R. L. (2004). High school grade inflation from 1991 to 2003 (ACT Research Report Series, 2004–4). Iowa City, IA: American College Testing Program. Retrieved June 5, 2005, from
    Woods, M. (2007, March 4). Hey kids, lets make an FCAT deal. Florida Times-Union. Retrieved March 21, 2007, from
    Word, E., Johnston, J., Bain, H., Fulton, D. B., Boyd-Zaharias, J., Lintz, N., et al. (1990). Student/teacher achievement ratio (STAR): Tennessee's K-3 class size study. Nashville: Tennessee Department of Education.
    Wright, R. J. (1975). The affective and cognitive consequences of open education on middle-class elementary school students. American Educational Research Journal, 12(4), 449–468.
    Wright, R. J. (2006, February). Teachability and proficiency on the National Assessment of Educational Progress. Paper presented at the annual meeting of the Eastern Educational Research Association, Hilton Head Island, SC.
    Wright, R. J., & Lesisko, L. J. (2007, February). The preparation of technology leadership for the schools. Paper presented at the annual meeting of the Eastern Educational Research Association, Clearwater, FL.
    Wybranski, N. A. M. (1996). An efficacy study: The influence of early intervention on the subsequent school placements of children with Down syndrome. Doctoral dissertation, Widener University. ProQuest number AAT9701172.
    Yell, M. L., Drasgow, E., & Lowrey, E. (2003). No Child Left Behind: Analysis and implications for special education [PowerPoint presentation]. Columbia, SC: Center for Autism, University of South Carolina.
    Retrieved March 19, 2007, from
    Yi, Q., Zhang, J., & Chang, H. (2006). Assessing CAT test security severity. Applied Psychological Measurement, 30(1), 62–63.
    Zabala, D., & Minnici, A. (2007). It's different now. How exit exams are affecting teaching and learning in Jackson and Austin. Washington DC: Center on Education Policy. Retrieved March 15, 2007, from
    Zackon, J. F. (1999). A study of the Pennsylvania System of School Assessment: The predictive value of selected characteristics of Pennsylvania school districts for student performance on the PSSA. Unpublished doctoral dissertation, Widener University.
    Zehr, M. A. (2007a). Missouri seeks to aid ELLs now overlooked: Those with disabilities. Education Week, 26(34), p. 7.
    Zehr, M. A. (2007b). States adopt new tests for English-learners. Education Week, 28(20), pp. 26, 31.
    Zimmerman, B. J. (1989). A social cognitive view of self-regulated academic learning. Journal of Educational Psychology, 8(3), 329–339.
    Zirkel, P. (2004, April, 28). No child left average? We should drop the gamesmanship of eliminating class rank. Education Week, 23(33), p. 36.
    Zirkel, P. (2007, March 28). Grade inflation: High school's skeleton in the closet. Education Week, 26(29), pp. 40, 32.
    Zweigenhaft, R. L. (1993). Prep schools and public school graduates of Harvard: A longitudinal study of the accumulated social and cultural capital. Journal of Higher Education, 64(2), 21–225.
    Zwick, R., & Schlemer, L. (2004). SAT validity for linguistic minorities at the University of California Santa Barbara. Educational Measurement: Issues and Practice, 25, 6–16.

    About the Author

    Robert J. Wright. After five years in public education, first as a science teacher (chemistry certification), then as a secondary school guidance counselor, I returned to graduate school (Temple University) and completed my Ph.D. in educational psychology. Later, I completed postdoctoral work in clinical assessment and school psychology at Lehigh University. Through these studies and practice, I have achieved state certification as a teacher of general science and chemistry. I am also a licensed guidance counselor and a licensed psychologist in Pennsylvania.

    During my 34 years in higher education I have taught educational measurement, statistics and research, counselor education, and educational psychology. I spent 14 years serving as associate dean and director of teacher education within Widener University's School of Human Service Professions. I have also been a consultant for the Pennsylvania Department of Education in the development of a teacher certification examination and a reader for the SAT II writing test.

    As a faculty member I have chaired 112 doctoral dissertations in education, presented scores of research-oriented papers at national meetings, and published numerous articles and several monographs. I have also consulted with several learned and professional societies in medicine as a psychometric specialist with their resident-in-training examination programs.

    • Loading...
Back to Top

Copy and paste the following HTML into your website