When dealing with **non-separable data**, another approach is to introduce a cost associated with making a classification mistake at (see below).

The exact cost is specified by the so-called

**loss functions**.

- The
**Heaviside**loss function penalizes for any instance appearing on the wrong side of the SVM curve. - The
**hinge**loss function only penalizes for instances appearing outside the other margin, and it increases with the distance.

In both instances, the goal is simple: we attempt to maximize the margin while minimizing the total costs.

Practically speaking, this has the simple effect of putting an upper bound on the weights in the

**soft-margin dual formulation**: we aim to solve

to get a set of support vectors with non-zero weights and a decision function

which is scaled as in the previous posts, and which satisfies

for each support vector.

This approach introduces a new cost parameter which must be specified prior to solving the QP: a high cost means that few support vectors will violate the soft-margin (i.e. few mistakes are allowed), while a low cost means that more mistakes are allowed and the soft-margin may contain a large number of support vectors (see below for an example).

##### Support Vector Machines – Posts

- Overview
- Classification
- Kernel Transformations [previous]
- Non-Separable Data [current]
- Further Considerations [next]
- Examples
- References