Skip to main content icon/video/no-internet

Self-organizing maps (SOMs) are not cartographic maps; rather SOMs are mathematical transformations of data to a predefined two-dimensional (2D) structure to reveal data clusters based on similarity. Conceptually, the basic idea of SOM corresponds well to Tobler's first law of geography, but SOMs consider distances in a space of multiple attributes rather than a space of geographic coordinates. Methodologically, SOM computation relates to the suite of multidimensional scaling (MDS) methods (e.g., principal components analysis [PCA], factor analysis, K nearest-neighbor method, and many others) that attempt to project multidimensional variables to limited dimensional (often 2D and 3D) spaces, so that humans can discern meaningful patterns through visualization and analysis. More specifically, Teuvo Kohonen, the inventor of SOM, suggested that SOM is an ordered structure of “local PCA” in that the non-linearity of data vectors precludes the appropriateness of common principal components to explain all data vectors. Instead, an SOM presents 2D grouping of data vectors, and the first and second principal components vary from one group to another over the SOM.

Compared with many MDS methods, SOM method is fundamentally a nonparametric regression approach with neural network learning algorithms. SOM handles each data item as a vector of values in an ordered array of variables. An SOM is built by a sequence of iterative processes:

  • Determining the topological structure of neighborhoods
  • Determining the first set of reference vectors (usually random or based on known structures in the data)
  • Calculating the distances of individual data vectors to reference vectors and determine first grouping
  • Calculating the next generation of reference vectors based on a mathematical definition of “generalized median” vectors from the defined neighborhood of the previous grouping
  • Reiterating the steps until the reference vectors converge into a predefined structure

Neighborhoods of hexagons are preferred over squares because hexagons show the same range from the center to all sides. As an SOM is built through iterative learning processes in the determination of reference vectors and grouping, the size of the neighborhood and the number of iterations can influence the learning factor and degree of convergence in the final map. Kohonen recommended that the neighborhood size start at half the diameter of the network (i.e., the defined size of the SOM, analogous to spatial extent), reducing over time during the computation to perhaps the cell unit in the end of the computation. Furthermore, according to Kohonen's rule of thumb, the number of iterations should be at least 500 times the number of network units (i.e., the defined size of the SOM cell unit, analogous to resolution) to ensure good statistical accuracy; a typical number of iterations used in building SOM is 100,000.

How to measure distance is central to determining similarity among data items and, therefore, SOM outcomes. There are several ways to measure distance; Euclidean distance (simple linear distance between two points), Hamming distance (the number of steps to make two binary lists identical), Levenshtein distance (the total number of required steps—replacements, insertions, and deletion—to make two strings identical), or simply binary distance (0 or 1) are commonly used distance metrics. Because the distance measures will be computed in a vector space with variables that may have distinct value domains, standardization of variables is critical to avoid variables with high values overpowering other variables and dominating the computation of distance. often, variables are standardized based on statistical parameters (such as means and standard deviation) or other means appropriate for the data on hand.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading