When dealing with non-separable data, another approach is to introduce a cost associated with making a classification mistake at (see below).
The exact cost is specified by the so-called loss functions.
- The Heaviside loss function penalizes for any instance appearing on the wrong side of the SVM curve.
- The hinge loss function only penalizes for instances appearing outside the other margin, and it increases with the distance.
In both instances, the goal is simple: we attempt to maximize the margin while minimizing the total costs.
Practically speaking, this has the simple effect of putting an upper bound on the weights in the soft-margin dual formulation: we aim to solve
to get a set of support vectors with non-zero weights and a decision function
which is scaled as in the previous posts, and which satisfies
for each support vector.
This approach introduces a new cost parameter which must be specified prior to solving the QP: a high cost means that few support vectors will violate the soft-margin (i.e. few mistakes are allowed), while a low cost means that more mistakes are allowed and the soft-margin may contain a large number of support vectors (see below for an example).