Skip to main content icon/video/no-internet

Split-Half Reliability

Measurement is fundamental to almost all forms of research and applied science. To conduct quantitative research, scientists must measure at least one variable. For example, researchers studying the effect of social rejection on self-esteem must measure participants’ self-esteem in some way. Similarly, to apply scientific knowledge, practitioners often rely heavily on measurement. For example, school psychologists measure children's academic and cognitive aptitudes to place them in appropriate classes and to identify potential academic difficulties. Given the importance of measurement, researchers and practitioners must evaluate the quality of the measurement tools that they use. Reliability is a key facet of measurement quality, and split-half reliability is a method of estimating the reliability of a measurement instrument.

Reliability

Briefly stated, reliability reflects the precision of scores obtained from a measurement instrument—how closely participants’ scores on the instrument correspond to their real characteristics. Unfortunately, many factors can interfere with measurement in any scientific domain, some of which are unsystematic sources of measurement error. Such factors artificially inflate some participants’ scores and deflate others’ scores in a random, or unsystematic, way. In behavioral research, these factors can include guessing, poorly written items, fatigue, misreading test items, and temporary mood states.

Consider, for example, a participant in a study involving a measure of trait self-esteem (i.e., the degree to which a person sees himself or herself in a generally positive way). Imagine that the participant actually has a high level of trait self-esteem, generally having a positive view of himself or herself. Unfortunately, one or two of the self-esteem questionnaire's items are worded in a confusing manner (e.g., “I rarely feel as if I don't have low self-esteem”). Such items can elicit confused responses that do not reflect accurately the person's truly high level of self-esteem, thereby introducing error and imprecision into the measurement process. As an index of measurement precision, reliability reflects the degree to which test scores are free of unsystematic measurement error.

Reliability cannot be known directly, so it must be estimated. Much as a person's self-esteem is not directly observable and must be estimated from his or her test scores, reliability is not directly observable and must be estimated from a set of test scores. As a fundamental facet of reliability, measurement error cannot be known in reality—researchers cannot truly know the degree to which a respondent's scores are affected by fatigue, confusing wording, mood states, or any of the many factors potentially affecting test scores. Consequently, reliability must be estimated from the scores obtained on the measurement instrument itself. Split-half reliability is one of many approaches to estimating the reliability of scores on a measurement instrument.

Computing and Interpreting Split-Half Reliability

The split-half method of estimating reliability is most directly applicable to instruments that have multiple items. Indeed, many instruments in behavioral research are tests, questionnaires, inventories, or surveys that include two or more items.

Table 1 Split-Half Reliability Example Data

None
Note: SD = standard deviation.

Consider the hypothetical set of responses in Table 1. Imagine that a researcher wishes to estimate the reliability of a four-item test of trait self-esteem, in which each item presents a statement relevant to self-esteem (e.g., “I often feel that I am a good person”). People respond to each item using a seven-point scale indicating their level of agreement with the statements (e.g., 1 = strongly disagree, 4 = neutral, and 7 = strongly agree)—thus, larger numbers reflect greater self-esteem. Peoples’ responses are summed to create a total score indicating their level of trait self-esteem. Of course, many good tests include negatively keyed items, for which an endorsement or agreement reflects a low level of the characteristic being measures (e.g., “I rarely feel like I'm a good person”). Such items must be reverse scored before scoring the scale and evaluating reliability. As shown in Table 1, Person 2 has the highest level of self-esteem and Person 4 has the lowest. Being aware that scores on the instrument might be affected by measurement error, the researcher estimates the reliability of these scores.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading