Skip to main content icon/video/no-internet

Differential Item Functioning

Item bias represents a threat to the validity of test scores in many different disciplines. An item is considered to be biased if the item unfairly favors one group over another. More specifically, an item is considered to be biased if two conditions are met. First, performance on the item is influenced by sources other than differences on the construct of interest that are deemed to be detrimental to one group. Second, this extraneous influence results in differential performance across identifiable subgroups of examinees.

The use of the term bias refers to various contexts, both statistical and social. From a statistical point of view, an item is said to be biased if the expected test or item scores are not the same for subjects from different subpopulations, given the same level of trait on the instrument of interest. Thus, bias is not simply a difference between the means of item scores for subjects from different subpopulations. Group mean differences on an item could simply indicate differences in their ability on the construct the item is measuring. In order to show the presence of bias, one must show that groups continue to differ in their performance on an item or test even after their ability levels are controlled for. From a social point of view, an item is said to be biased if this difference is evaluated as being harmful to one group more than other groups.

Figure 1 An Illustration of Gender Effect

None

In most psychometric research, there is an interest in detecting bias at the item level. One application of this would be in test development. Items that show bias can be reformulated or removed from the instrument. By considering bias at only the test level, one faces the real possibility of missing bias for a particular item. Furthermore, by considering bias on the item level, it is possible to see whether certain items are biased against certain subpopulations.

One characteristic of bias is differential item functioning (DIF), in which examinees from different groups have differing probabilities of success on an item after being matched on the ability of interest. DIF is a necessary but insufficient condition for item bias. If an item is biased, then DIF is present. However, the presence of DIF does not imply item bias in and of itself.

Figure 2 An Illustration of No Gender Effect Controlling for Latent Trait

None

An illustration of DIF is given in Figures 1 through 3. In this example, suppose there are two groups of subjects (e.g., men and women) that have different probability of a dichotomous response on an item i, illustrated in Figure 1. A heavier weight signifies a higher probability of getting the item correct. In Figure 1, men have a higher probability of getting this particular item correct.

Because this item is an indicator of some latent, then the difference between the two groups is possibly attributable to the latent trait. Therefore, controlling for this latent trait (matching criterion) should remove the relationship between the gender and the item score. If this is the case, the item is measurement invariant across the groups. This is illustrated in Figure 2.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading