Support Vector Machines — Non-Separable Data

An example of a dataset and non-linear gaussian SVM model with soft margin and low cost of error (author unknown).
An example of a dataset and non-linear gaussian SVM model with soft margin and low cost of error (author unknown).

When dealing with non-separable data, another approach is to introduce a cost \xi_i associated with making a classification mistake at \mathbf{x}_i (see below).

Example of unseparable margin and cost (J. Watson, Support Vector Machine Tutorial).
Example of unseparable margin and cost (J. Watson, Support Vector Machine Tutorial).

The exact cost is specified by the so-called loss functions.

  • The Heaviside loss function penalizes for any instance appearing on the wrong side of the SVM curve.
  • The hinge loss function only penalizes for instances appearing outside the other margin, and it increases with the distance.

Loss functions for SVMs Fawcett & Provost's Data Science for Business).
Loss functions for SVMs (Fawcett & Provost's Data Science for Business).

In both instances, the goal is simple: we attempt to maximize the margin while minimizing the total costs.


Practically speaking, this has the simple effect of putting an upper bound on the weights in the soft-margin dual formulation: we aim to solve

    \[\arg\min_{\alpha} \left\{\frac{1}{2}\sum_{i,j=1}^n\alpha_i\alpha_jK(\mathbf{x}_i,\mathbf{x}_j)-\sum_{i=1}^n\alpha_i:\mbox{ subject to } \sum_{i=1}^n\alpha_iy_i=0\mbox{ and }0\leq\alpha_i\leq C\right\}\]

to get a set of support vectors \{x_{i_k}\} with non-zero weights \{\alpha_{i_k}\} and a decision function

    \[T(\mathbf{x};\alpha)=\sum_{k=1}^M\alpha_{i_k}y_{i_k}K(\mathbf{x}_{i_k},\mathbf{x})+\beta_0,\]

which is scaled as in the previous posts, and which satisfies

    \[|T(\mathbf{x}_{i_k};\alpha)|\geq 1-\xi_{i_k}\]

for each support vector.

This approach introduces a new cost parameter C which must be specified prior to solving the QP: a high cost means that few support vectors will violate the soft-margin (i.e. few mistakes are allowed), while a low cost means that more mistakes are allowed and the soft-margin may contain a large number of support vectors (see below for an example).

An example of a dataset and non-linear gaussian SVM model with soft margin and low cost of error (author unknown).
An example of a dataset and non-linear gaussian SVM model with soft margin and low cost of error (author unknown).


Support Vector Machines - Posts
  1. Overview
  2. Classification
  3. Kernel Transformations [previous]
  4. Non-Separable Data [current]
  5. Further Considerations [next]
  6. Examples
  7. References
About Dr. Idlewyld 7 Articles
As a youth, Dr. Idlewyld used to read everything he could lay his hands on and he was in a band. For years, he believed that the NHL would have come calling if he hadn't broken his leg as a kid in a hilarious skiing mishap. Nowadays, whatever's left of his hair is slowly turning grey, and that can only mean one thing: he's had the chance to work on plenty of quantitative projects, providing expertise in operations research methods, data science and predictive analytics, stochastic and statistical modeling, and simulations. So he's got that going for him, which is nice. He's not keen on buzzwords, but overall he's glad to see interest in analytical endeavours grow. In the final analysis, he thinks that insights and discoveries are within everyone's reach, and that he would have made a great goalie.