The best **separating strip** may not be linear, especially when the underlying data structure is too complex.

The solution is “simple”: we devise a transformation from the initial feature space to a higher-dimensional space, and we train a linear SVM to the data (see Watson’s example above).

We thus solve

to get a set of support vectors with non-zero weights and a decision function

which is scaled as in the previous post. Class assignment for proceeds exactly as as in the case of the simple linear kernel given by the identity transformation .

Not every transformation is allowable, unfortunately: in order for the associated QP to remain solvable, the **inner product** must be positive semi-definite. Standard families of transformations do satisfy these conditions, however.

How does one chose an appropriate ? Prior knowledge of the data structure, including hitting on the right level of complexity, come in handy. Would you have been able to recognize that

was the “right” transformation for the data presented in the image above? What would be an appropriate transformation for the dataset below?

Obviously, selecting the perfect is not always easy, or even feasible. Thankfully, there are **kernel** functions that generalize the notion of the inner product and that subsumes the need to specify a function in certain cases, at the cost of providing values for a few parameters.

The most common kernels include:

**simple linear**: ;**polynomial**: , where (typically 1) and ;**gaussian**(or radial-basis function): , where is positive, and**sigmoidal**(or perceptron): , for allowable combinations of and .

In this case, we solve

to get a set of support vectors with non-zero weights and a decision function

which is scaled as in the previous post.

##### Support Vector Machines – Posts

- Overview
- Classification [previous]
- Kernel Transformations [current]
- Non-Separable Data [next]
- Further Considerations
- Examples
- References