Skip to main content icon/video/no-internet

Parallel Forms Reliability

Parallel forms reliability (sometimes termed alternate forms reliability) is one of the four primary classifications of psychometric reliability, along with test–retest reliability, inter-rater (or inter-scorer, inter-observer) reliability internal consistency reliability. The different categorizations of reliability differ primarily in the different sources of non-trait or non-true score variability, and secondarily in the number of tests and occasions required for reliability estimation. This entry discusses the construction of parallel forms and the assessment, benefits, and costs of parallel forms reliability.

Construction of Parallel Forms

The creation of parallel forms begins with the generation of a large pool of items representing a single content domain or universe. At minimum, the size of this item pool should be more than twice the desired or planned size of a single test form, but the item pool should also be large enough to establish that the content domain is well represented. Parallel test forms are generated by the selection of two item sets from the single universe or content domain. The nature of this selection procedure can vary considerably in specificity depending on setting, from random selection of items drawn from the homogeneous item pool, to paired selection of items matched on properties such as difficulty. The resulting forms will typically have similar overall means and variances.

A subtle distinction can be drawn between the concept of parallel forms as it is popularly used and the formal psychometric notion of parallel tests. In the context of classical test theory, parallel tests have identical latent true scores, independent errors, and identical error variances. In practice, two forms of a test considered to be parallel do not meet the formal standards of being parallel tests but often approach such standards depending on procedures used in the item assignment process.

Parallel forms represent two distinct item sets drawn from a content domain. Multiple test forms that contain non-distinct sets (i.e., overlapping items) are useful in many situations. In educational settings, multiple test forms can be created by scrambling items, which ideally discourages cheating via the copying of nearby test answers. In this case, the two forms are actually the same form in terms of content, and the appropriate reliability assessment would be test–retest. Identical items may also be placed on two test forms to facilitate a comparison of their characteristics under differing conditions (e.g., near start vs. end of test).

Assessment

Parallel forms reliability is assessed by sequentially administering both test forms to the same sample of respondents. The Pearson correlation between scores on the two test forms is the estimate of parallel forms reliability. Although the administration procedures for parallel forms reliability differ from test–retest reliability only in the existence of a second test form, this has important implications for the magnitude of the reliability coefficient. In the estimation of parallel forms reliability, error variance is composed not only of transient error occurring between Time 1 and Time 2 but also of variance due to differing item content across forms. Thus, test–retest reliability tends to be greater in magnitude than parallel forms reliability. A negligible difference between the two estimates would suggest minimal contribution of varying item content to error variance and strong evidence supporting the use of the forms.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading