Skip to main content icon/video/no-internet

Exploratory spatial data analysis (ESDA) is an approach to the analysis of spatial data employing a number of techniques, many of which are graphical or interactive. It aims to uncover patterns in the data without rigorously specified statistical models. For geographical information, the graphical techniques employed often involve the use of interactive maps linked to other kinds of statistical data displays or graphical techniques other than maps that convey information about the spatial arrangement of data and how this relates to other attributes.

In 20th-century statistics, one of the major areas of development is that of statistical inference. This is a formal approach to data analysis, in which a probabilistic model is put forward for a given data set and either: (a) an attempt to estimate some parameter is made on the basis of the data; or (b) an attempt to test a hypothesis (typically that some parameter is equal to zero) is made on the basis of the data.

This approach to data analysis has had a far-reaching influence in a number of disciplines, including the analysis of geographical data. An idea underpinning this is the probabilistic model mentioned above—a mathematical expression stating the probability distribution of each observation. To consider ESDA, one has to ask, How is the probabilistic model arrived at? In some cases, there may be a clear theoretical direction, but this is not always true. When it is not, the approach of exploratory data analysis takes on an important role, as an initial procedure to be carried out prior to the specification of a data model. The aim of exploratory data analysis (EDA) is therefore to describe and depict a set of data—and that of exploratory spatial data analysis is to do this with a set of spatial data.

In EDA generally, there are a number of key tasks to perform:

  • Assess the validity of the data, and identify any dubious records
  • Identify any outlying-data items
  • Identify general trends in the data

The first two tasks are linked: Outlying-data observations may occur due to some error in either automated or manual data recording. However, an outlier is not always a mistake—it may be just a genuine but highly unusual observation. An exploratory analysis can unearth unusual observations, but it is the task of the analyst to decide whether the observation is an error or a true outlier.

The third idea, that of identifying trends, is more directly linked to the idea of model calibration and hypothesis testing. By plotting data (e.g., in a scatterplot), it is often possible to generate suggestions for the kinds of mathematical forms that may be used to model the data. For example, in Figure 1, it seems likely that a linear relationship (plus an error term) exists between the variables labeled Deviation From Mean Date and Advancement. It is also clear that a small number of points do not adhere to this trend. Thus, a simple scatterplot is an exploratory tool that can identify both trends and outliers in the data. It can also be seen that the process of identifying outliers is important, as excessive influence of one or more unusual observations can “throw” significance tests and model calibrations. Thus, an EDA might suggest that more robust calibration techniques are needed when more formal approaches are used.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading