Margin lines for SVM

I’ve been thinking about writing a blog post on what I have been doing these days. I thought, is there anything I can write about? Not just anything I learned, but something I can contribute to the public? And realized that few days ago I searched the web for a problem, but I couldn’t find the correct answer right away. Then I had to solve it myself. The problem is how to find margin lines from the decision boundary obtained from the linear SVM (Support Vector Machine) classifier.

This is not a hard problem to solve, but it needs a little bit of understanding on how SVM works. First, I will go through basic formula, and then get margin lines for two-feature cases.

If we have two continuous features, x_0 and x_1, and binary class, y, SVM finds the decision boundary using the following equation. For a training set of N points
[(x_0^{(i)}, x_1^{(i)}) and y_i for i=1\sim N, where y^{(i)} = -1 \mbox{ or } 1], we try to maximize M, when |{\bf w}|\equiv\sqrt(w_0^2+w_1^2)=1 and y^{(i)}(w_0x_0^{(i)}+w_1x_1^{(i)}+b) \ge M for \forall i. Here M is interpreted as the geometrical margin, the shortest distance from the decision boundary to a data point in the training set. The parameter space we are exploring is actually two-dimensional (the direction of (w_0, w_1), and the intercept term b) because (w_0, w_1) is assumed to be a unit vector. This optimization problem is not an easy problem to solve in general. Hence we change the problem by setting the constraint of (w_0,w_1) as |{\bf w}|=1/M instead. Now the length of (w_0,w_1) is not fixed, rather changes as the direction of (w_0,w_1) and b change, depending on the given data set. The parameter space we are exploring is the same as before; only the constraint of the length of (w_0, w_1) is changed. Now the formula became simpler: we need to minimize 1/|{\bf w}| when y^{(i)}(w_0x_0^{(i)}+w_1x_1^{(i)}+b) \ge 1 for \forall i. Minimizing 1/|{\bf w}| is the same as maximizing \frac{1}{2}|{\bf w}|^2 actually, and this problem is now represented in an easier form to handle. We are not going into detials on how we solve this optimization problem, but we will keep in mind that |{\bf w}|=1/M, where M is the geometrical margin.

Let’s say we are using a python library, sklearn.svm.SVC. If we fit the training data, the package gives us optimal parameter values after solving above formula: coef_[0][0] as w_0, coef_[0][1] as w_1, and intercept_[0] as b. The decision boundary can be found easily from these parameters: w_0 x_0 + w_1 x_1 + b = 0, which is x_1 = -(w_0/w_1) x_0 - (b/w_1). We know the margin is 1/|{\bf w}|, but how much should we add/subtract to the intercept term of a margin line, given the decision boundary? If we call that value as \Delta, it can be found using trigonometry (see the figure).

svm2

\sin\theta = \frac{M}{\Delta} = \frac{M^2|w_1|}{M}

Then, \Delta is just 1/w_1 (it can be either positive or negative, but it doesn’t matter because we need both).

Then a part of the python code for plotting these lines looks like this.

yy = -(w[0] / w[1]) * xx - (b / w[1])
yy_margin1 = yy + 1 / w[1]
yy_margin2 = yy - 1 / w[1]

I know this is not a hard problem, but it can be handy when we want to plot margin lines. (I found that some example codes at sklearn.svm.SVC were not correct at this moment, when they draw margin lines. One only works for separable cases, while the other one uses M*(w_0/w_1) for \Delta, which is a little off.)

PS: All blog posts I have written so far (actually I haven’t written any for more than five years) are in Korean. So this will be my first blog post in English.