Skip to main content icon/video/no-internet

Performance Standards: Constructed Response Item Formats

Introduction

When standard-setting methods were initially developed, most of the assessments consisted of selected-response or multiple-choice items. These methods most often focused on judgements by experts on the probable item-level performance of examinees. For example, the Angoff (1971) standard-setting method asks panellists to estimate the probability that a randomly selected, hypothetical ‘minimally competent candidate (MCC)’ would be able to answer items from the test correctly. The Nedelsky (1954) method focuses the panellists' judgements on the alternatives comprising multiple-choice questions, asking panellists to identify those alternatives that the MCC would be able to eliminate as incorrect. The probability of an MCC getting the item correct is calculated as a function of the number of remaining options. Obviously, these kinds of methods will not work very well with constructed-response items.

Constructed-Response Questions

Currently, many assessments contain open-ended questions, either in the form of written essays, oral response, portfolios, observations of performance by scorers of real or simulated patients, or through structured patient management protocols. An important consideration when setting cutscores with constructed-response assessments is the total number of constructed-response questions that comprise the assessment package and the complexity of these questions. In some assessments, the number of constructed-response questions is fairly small (between 5–10) and for others, the number is much higher (15–20 or more).

The magnitude and complexity of the total assessment has implications for the utility of some of the standard-setting approaches used with constructed-response assessments. If the total number of questions and the complexity of these responses are somewhat limited, procedures that seek a holistic decision about the overall performance of the candidates can be used. When the number of questions is high, the capability of the panellists to make a holistic judgement about the overall performance becomes more difficult. In such cases, strategies need to be employed that use the information on the individual questions to set an overall performance standard. One such approach is to set individual performance standards on the separate questions and then to aggregate these performance standards question-by-question to obtain the cutscore on the full test.

Question-By-Question Methods

Several approaches use this question-by-question (sometimes referred to as an exercise-by-exercise) approach. A prevalent strategy employed with constructed-response questions uses an analytic analysis of the probable performance of a typical MCC. In many of these applications, the scoring guidelines identify positive points for specific responses. In addition, negative points can be accrued through making anticipated mistakes. Through an analysis of the anticipated performance of the MCC, combining positive and negative points, the expected score for the MCC is obtained for the question. An aggregation of these question-level expected scores across all the questions in the test serves of the cutscore for the test.

Hambleton and Plake (1995) used extended Angoff approach to have panellists estimate, for five questions scored on a 1–4 point scale, the anticipated score of the MCC. Next, panellists were asked to weight each of the 5 questions, where the weights represent the relative importance of that question to the overall purpose of the assessment. The product of the question's weight and the anticipated score for the MCC on that question was aggregated across the 5 questions to form an overall weighted minimum passing score. This approach attempts to focus the final cutscore not only on the anticipated performance of the MCC on the individual questions, but to take into account the total makeup of the examination in a more holistic sense. Through their weights, panellists can identify more important questions to receive relatively higher emphasis in the final pass/fail decision.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading