Skip to main content icon/video/no-internet

Stripped of its numerous complexities, data indexing is the high-tech version of the tedious indexing process familiar to any 20th-century textbook author and results in the same benefits for the reader. The major difference is that computers are doing both the indexing and the accessing of the indexes. Just as with a textbook index, the basic goal behind data indexing is to prevent the reader (in this case, the computer) from having to search and sift through an entire data set—a prohibitively slow process—to find only the specific information it needs. More formally, this allows the reader random access to indexed elements, which is much more efficient than sequential access. Data indexes, crucially important data structures in computer science, make possible a vast number of technologies, particularly those dealing with the enormous information repositories so prevalent in the modern age. Without them, everything from a Google search to the dynamic display of geographic information would be exponentially slower, easily to the point of total impracticality. Two types of data indexing are prevalent in geography. The first is “nonspatial indexing,” and the second is “spatial indexing.” Nonspatial indexing is applied throughout computer science and related fields, while spatial indexing is designed specifically to work with n-dimensional data, where n ≥2 (in geography, we usually find that n = 2 or 3).

Nonspatial Data Indexing

As noted above, nonspatial indexing appears in the implementation of a very large percentage of computer applications. While there are many types of nonspatial indexes, they all allow a program to query a data set or database for specific information much faster than would be possible if the computer had to look through the entire data set or database for that information. The type of index included in a particular application is usually carefully chosen based on certain key properties, such as the additional data storage costs (just like a book's index, data indexes take up additional space), the type of data that will be indexed, and, of course, the speed of lookups. Sometimes the indexing process and the data index type can be very simple. For example, the file with the “.shx” extension that is part of the shapefile specification is purely a rudimentary index file that makes it easier for any program reading the shapefile to find the data related to a specific point, line, or polygon. It is essentially made up of a set of pointers to the beginning of data records in the .shp file.

Figure 1 A diagram of a simplified B-tree index

None
Source: Author

Many times, however, more complex data index approaches are needed. One important type of nontrivial data index is the B-tree index. Invented in 1972, B-trees and their variants are still vital in modern systems. For example, B-trees are the default nonspatial indexing method in the very popular MySQL database management system. While the details of B-tree implementations are outside the scope of this text, the basic idea of their structure is illustrated in Figure 1.

The shaded entries indicate which “nodes” need to be examined to find the record number “18” (and data about record 18). A “sequential scan” of the “data set” would have had to look at 18 different entries (assuming the data had been presorted), while the B-tree reduces that number to 3.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading