Skip to main content icon/video/no-internet

Support Vector Machines

Support vector machines (SVMs) are machine learning models that share some similarities with neural networks and logistic regression models for classification tasks. Examples of such tasks arise naturally in clinical settings, whenever one is given a set of data descriptors (e.g., lab results, clinical findings, imaging data, genetic information) and wants to predict health status or medical outcome given such data. In the simplest case, which is discussed in this entry, there are only two possible outcomes to predict. In a clinical context, these two can, for example, correspond to healthy and diseased patient states, respectively.

Traditionally, logistic regression and artificial neural network models have been the tools of choice for solving classification tasks as outlined above. In the past 10 years, SVMs have increasingly been used in data-intensive machine learning scenarios in clinical contexts, for example, as decision aids for the classification of mass spectronomy data or imaging data.

The following assumptions and notational conventions are used here: An n-element data set D = {xi, ti} contains m-dimensional data points (cases) xi that serve as inputs to the SVM that classifies these cases into the associated class labels ti {1, +1} (outcomes or outputs of the SVM). The data points xi are mathematical representations of the clinically relevant information that is to be used in the classification task. In a similar vein, the class labels ti are abstractions of the two classes that should be predicted by the model. Because they are m-dimensional, the data points will sometimes be referred to as vectors. The pattern classification task is to find a model (in this case, the SVM) and associated parameter settings that are able to predict class labels, given the data points, while making the fewest mistakes. This means that, when given a new data point x* from the same distribution as D, the model output should be the correct class label of x* as often as possible. For most data sets, it will not be possible to reduce the average error to 0. A model that makes few mistakes on unseen data is said to generalize well.

The following presentations are mathematical in nature because SVMs are based on geometrical concepts. Nevertheless, it is hoped that the essence of SVMs (the “what”) can be grasped without having to understand all the mathematical details (the “how”).

Optimal Separating Hyperplanes

A hyperplane is the extension of the concept of a straight line (in 2D) or a plane (in 3D) to n > 3 dimensions. A hyperplane, H, defined as the set of all points x that satisfy the equation w ·x + b = 0, partitions its enclosing space into three parts: (1) the points directly on the plane, (2) the points for which w · x + b > 0, and (3) the points for which w ·x + b < 0. Here, w ·x denotes the dot product of the two m-dimensional vectors w and x, that is, the result of multiplying all components of x with the corresponding components of w and adding up the results. The two parameters w and b encode the position of the hyperplane in its enclosing space: w encodes the orientation and b the distance to the origin. Together, these two parameters uniquely determine where H lies. A hyperplane H can thus be used as a classification model: All the points on one side of H belong to one class; all the points on the other side belong to the other class. A data set that can be classified in this sense by a hyperplane is called linearly separable.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading