Skip to main content icon/video/no-internet

In social science research, the idea of spurious correlation is taken to mean roughly that when two variables correlate, it is not because one is a direct cause of the other but rather because they are brought about by a third variable. This situation presents a major interpretative challenge to social science researchers, a challenge that is heightened by the difficulty of disentangling the various concepts associated with the idea of spurious correlation.

Correlation and Causation

Drawing appropriate causal inferences from correlational data is difficult and fraught with pitfalls. One basic lesson social scientists learn in their undergraduate statistics education is that correlation does not imply causation. This adage is generally taken to mean that correlation alone does not imply causation. A correlation between two variables X and Y is not sufficient for inferring the particular causal relationship “X causes Y” because a number of alternative causal interpretations must first be ruled out. For example, Y may be the cause of X, or X and Y may be produced by a third variable, Z, or perhaps X and a third variable, Z, jointly produce Y, and so on.

The statistical practice in the social sciences that is designed to facilitate causal inferences is governed by a popular theory of causation known as the regularity theory. This theory maintains that a causal relation is a regularity between different events. More specifically, a relationship between two variables X and Y can properly count as causal only when three conditions obtain: (a) X precedes Y in time; (b) X and Y covary; and (c) no additional factors enter into, and confound, the X-Y relationship.

The third condition requires a check for what social scientists have come to call nonspuriousness. A relationship between X and Y is said to be nonspurious when X is a direct cause of Y (or Y is a direct cause of X). A relationship between X and Y is judged nonspurious when we have grounds for thinking that no third variable, Z, enters into and confounds the X-Y relationship. In this regard, researchers typically seek to establish that there is neither a common cause of X and Y nor a cause intervening between X and Y.

Senses of Spurious Correlation

The term spurious correlation is ambiguous in the methodological literature. It was introduced by Karl Pearson at the end of the 19th century to describe the situation in which a correlation is found to exist between two ratios or indices even though the original values are random observations on uncorrelated variables. Although this initial sense of a spurious correlation remains a live issue for some social science researchers, it has given way to a quite different sense of spurious correlation. In the 1950s, Herbert Simon redeployed the term to refer to a situation where, in a system of three variables, the existence of a misleading correlation between two variables is produced through the operation of the third causal variable. H. M. Blalock's extension of Simon's idea into a testing procedure for more-complex multivariate models has seen this sense of a spurious correlation come to dominate in the social sciences. As a consequence, the social sciences have taken the problem of spuriousness to be equivalent to checking for the existence of third variables.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading