Skip to main content icon/video/no-internet

Matching

The term matching refers to the procedure of finding for a sample unit other units in the sample that are closest in terms of observable characteristics. The units selected are usually referred to as matches, and after repeating this procedure for all units (or a subgroup of them), the resulting subsample of units is called the matched sample. This idea is typically implemented across subgroups of a given sample, that is, for each unit in one subgroup, matches are found among units of another subgroup. A matching procedure requires defining a notion of distance, selecting the number of matches to be found, and deciding whether units will be used multiple times as a potential match. In applications, matching is commonly used as a preliminary step in the construction of a matched sample, that is, a sample of observations that are similar in terms of observed characteristics, and then some statistical procedure is computed with this subsample. Typically, the term matching estimator refers to the case when the statistical procedure of interest is a point estimator, such as the sample mean. The idea of matching is usually employed in the context of observational studies, in which it is assumed that selection into treatment, if present, is based on observable characteristics. More generally, under appropriate assumptions, matching may be used as a way of reducing variability in estimation, combining databases from different sources, dealing with missing data, and designing sampling strategies, among other possibilities. Finally, in the econometrics literature, the term matching is sometimes used more broadly to refer to a class of estimators that exploit the idea of selection on observables in the context of program evaluation. This entry focuses on the implementation of and statistical inference procedures for matching.

Description and Implementation

A natural way of describing matching formally is in the context of the classical potential outcomes model. To describe this model, suppose that a random sample of size n is available from a large population, which is represented by the collection of random variables (Yi, Ti,Xi), i = 1,2,…, n, where Ti ∊ {0,1},

None

and Xi represents a (possibly high-dimensional) vector of observed characteristics. This model aims to capture the idea that while the set of characteristics Xi is observed for all units, only one of the two random variables (Y0i, Y1i) is observed for each unit, depending on the value of Ti. The underlying random variables Y0i and Y1i are usually referred to as potential outcomes because they represent the two potential states for each unit. For example, this model is routinely used in the program evaluation literature, where Ti represents treatment status and Y0i and Y1i represent outcomes without and with treatment, respectively. In most applications the goal is to establish statistical inference for some characteristic of the distribution of the potential outcomes such as the mean or quantiles. However, using the available sample directly to establish inference may lead to important biases in the estimation whenever units have selected into one of the two possible groups (Ti = 0 or Ti = 1). As a consequence, researchers often assume that the selection process, if present, is based on observable characteristics. This idea is formalized by the so-called conditional independence assumption: conditionally on xi, the random variables (Y0i, Y1i are independent of Ti. In other words, under this assumption, units having the same observable characteristics xi are assigned to each of the two groups (Ti = 0 or Ti = 1) independently of their potential gains, captured by (Y0i, Y1i). Thus, this assumption imposes random treatment assignment conditional on xi. This model also assumes some form of overlap or common support: For some

None
. In words, this additional assumption ensures that there will be observations in both groups having a common value of observed characteristics if the sample size is large enough. The function
None
is known as the propensity score and plays an important role in the literature. Finally, it is important to note that for many applications of interest, the model described above employs stronger assumptions than needed. For simplicity, however, the following discussion does not address these distinctions.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading