max w , b 2 ∣ ∣ w ∣ ∣ 2 s . t . : y ( i ) ( w T x ( i ) + b ) ≥ 1 , i = 1 , 2 , … , m \max_{w,b} \frac{2}{||w||_2}\\ s.t. :y^{(i)}(w^Tx^{(i)}+b)\geq 1,i=1,2,…,m w,bmax∣∣w∣∣22s.t.:y(i)(wTx(i)+b)≥1,i=1,2,…,m
优化问题等价于,
min
w
,
b
∣
∣
w
∣
∣
2
2
2
s
.
t
.
:
y
(
i
)
(
w
T
x
(
i
)
+
b
)
≥
1
,
i
=
1
,
2
,
…
,
m
\min_{w,b} \frac{||w||_2^2}{2}\\ s.t.:y^{(i)}(w^Tx^{(i)}+b)\geq 1,i=1,2,…,m
w,bmin2∣∣w∣∣22s.t.:y(i)(wTx(i)+b)≥1,i=1,2,…,m
转化问题形式为:
J
(
w
)
=
∣
∣
w
∣
∣
2
2
2
s
.
t
.
:
1
−
y
(
i
)
(
w
T
x
(
i
)
+
b
)
≤
0
,
i
=
1
,
2
,
…
,
m
J(w) =\frac{||w||_2^2}{2}\\ s.t.:1-y^{(i)}(w^Tx^{(i)}+b)\leq 0,i=1,2,…,m
J(w)=2∣∣w∣∣22s.t.:1−y(i)(wTx(i)+b)≤0,i=1,2,…,m
使用拉格朗日乘子法,
L
(
w
,
b
,
β
)
=
∣
∣
w
∣
∣
2
2
2
+
∑
i
=
1
m
β
i
[
1
−
y
(
i
)
(
w
T
x
(
i
)
+
b
)
]
,
β
i
≥
0
L(w,b,\beta) =\frac{||w||_2^2}{2}+\sum_{i=1}^m \beta_i [1-y^{(i)}(w^Tx^{(i)}+b)],\beta_i \geq 0
L(w,b,β)=2∣∣w∣∣22+i=1∑mβi[1−y(i)(wTx(i)+b)],βi≥0
最优解应为
min
w
,
b
max
β
≥
0
L
(
w
,
b
,
β
)
\min_{w,b} \max_{\beta \geq 0} L(w,b,\beta)
minw,bmaxβ≥0L(w,b,β)
转化为对偶问题
max
β
≥
0
min
w
,
b
L
(
w
,
b
,
β
)
\max_{\beta \geq 0} \min_{w,b} L(w,b,\beta)
maxβ≥0minw,bL(w,b,β)
∂
L
∂
w
=
0
⇒
w
−
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
=
0
⇒
w
=
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
∂
L
∂
b
=
0
⇒
−
∑
i
=
1
m
β
i
y
(
i
)
=
0
⇒
∑
i
=
1
m
β
i
y
(
i
)
=
0
\frac{\partial L}{\partial w} = 0 \Rightarrow w - \sum_{i=1}^m \beta_i y^{(i)}x^{(i)} = 0 \Rightarrow w =\sum_{i=1}^m \beta_i y^{(i)}x^{(i)}\\ \frac{\partial L}{\partial b} = 0 \Rightarrow -\sum_{i=1}^m \beta_i y^{(i)} = 0 \Rightarrow \sum_{i=1}^m \beta_i y^{(i)} = 0
∂w∂L=0⇒w−i=1∑mβiy(i)x(i)=0⇒w=i=1∑mβiy(i)x(i)∂b∂L=0⇒−i=1∑mβiy(i)=0⇒i=1∑mβiy(i)=0
现在求解
β
\beta
β
L
(
β
)
=
∣
∣
w
∣
∣
2
2
2
+
∑
i
=
1
m
β
i
[
1
−
y
(
i
)
(
w
T
x
(
i
)
+
b
)
]
=
∣
∣
w
∣
∣
2
2
2
−
∑
i
=
1
m
β
i
[
y
(
i
)
(
w
T
x
(
i
)
+
b
)
−
1
]
=
1
2
w
T
w
−
∑
i
=
1
m
β
i
y
(
i
)
w
T
x
(
i
)
−
∑
i
=
1
m
β
i
y
(
i
)
b
+
∑
i
=
1
m
β
i
=
1
2
w
T
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
−
w
T
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
−
b
∑
i
=
1
m
β
i
y
(
i
)
+
∑
i
=
1
m
β
i
=
−
1
2
w
T
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
−
b
∑
i
=
1
m
β
i
y
(
i
)
+
∑
i
=
1
m
β
i
=
−
1
2
(
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
)
T
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
+
∑
i
=
1
m
β
i
=
−
1
2
(
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
T
)
∑
i
=
1
m
β
i
y
(
i
)
x
(
i
)
+
∑
i
=
1
m
β
i
=
∑
i
=
1
m
β
i
−
1
2
∑
i
=
1
m
∑
j
=
1
m
β
i
β
j
y
(
i
)
y
(
j
)
x
(
i
)
x
(
j
)
L(\beta) =\frac{||w||_2^2}{2}+\sum_{i=1}^m \beta_i [1-y^{(i)}(w^Tx^{(i)}+b)]\\ = \frac{||w||_2^2}{2}-\sum_{i=1}^m \beta_i [y^{(i)}(w^Tx^{(i)}+b)-1]\\ = \frac{1}{2}w^Tw - \sum_{i=1}^m \beta_i y^{(i)}w^Tx^{(i)} - \sum_{i=1}^m \beta_i y^{(i)}b + \sum_{i=1}^m \beta_i\\ = \frac{1}{2}w^T\sum_{i=1}^m \beta_i y^{(i)}x^{(i)} - w^T\sum_{i=1}^m \beta_i y^{(i)}x^{(i)} - b\sum_{i=1}^m \beta_i y^{(i)} + \sum_{i=1}^m \beta_i\\ = -\frac{1}{2}w^T\sum_{i=1}^m \beta_i y^{(i)}x^{(i)} - b\sum_{i=1}^m \beta_i y^{(i)} + \sum_{i=1}^m \beta_i\\ = -\frac{1}{2}{(\sum_{i=1}^m \beta_i y^{(i)}x^{(i)})} ^T \sum_{i=1}^m \beta_i y^{(i)}x^{(i)}\ + \sum_{i=1}^m \beta_i\\ = -\frac{1}{2}{(\sum_{i=1}^m \beta_i y^{(i)}x^{(i)T})} \sum_{i=1}^m \beta_i y^{(i)}x^{(i)}\ + \sum_{i=1}^m \beta_i\\ = \sum_{i=1}^m \beta_i- \frac{1}{2}{\sum_{i=1}^m \sum_{j=1}^m \beta_i \beta_j y^{(i)}y^{(j)}x^{(i)}x^{(j)}}
L(β)=2∣∣w∣∣22+i=1∑mβi[1−y(i)(wTx(i)+b)]=2∣∣w∣∣22−i=1∑mβi[y(i)(wTx(i)+b)−1]=21wTw−i=1∑mβiy(i)wTx(i)−i=1∑mβiy(i)b+i=1∑mβi=21wTi=1∑mβiy(i)x(i)−wTi=1∑mβiy(i)x(i)−bi=1∑mβiy(i)+i=1∑mβi=−21wTi=1∑mβiy(i)x(i)−bi=1∑mβiy(i)+i=1∑mβi=−21(i=1∑mβiy(i)x(i))Ti=1∑mβiy(i)x(i) +i=1∑mβi=−21(i=1∑mβiy(i)x(i)T)i=1∑mβiy(i)x(i) +i=1∑mβi=i=1∑mβi−21i=1∑mj=1∑mβiβjy(i)y(j)x(i)x(j)
To solve β value, we need an Sequential Minimal Optimization algorithm.
Paper: Sequential Minimal Optimization A Fast Algorithm for Training Support Vector Machines
如何求 w ∗ : w ∗ = ∑ i = 1 m β i ∗ y ( i ) x ( i ) w^*:w^* = \sum_{i=1}^m \beta_i^* y^{(i)}x^{(i)} w∗:w∗=∑i=1mβi∗y(i)x(i)
如何求
b
∗
b^*
b∗
y
s
(
w
T
x
s
+
b
)
=
y
s
(
∑
i
=
1
β
i
s
y
(
i
)
x
(
i
)
T
x
s
+
b
)
=
1
⇒
b
∗
=
y
s
−
∑
i
=
1
m
β
i
∗
y
(
i
)
x
(
i
)
T
x
s
y^s(w^Tx^s+b) = y^s(\sum_{i=1}\beta_i^s y^{(i)}x^{(i)T}x^s+b) = 1\Rightarrow b^* = y^s - \sum_{i=1}^m \beta_i^* y^{(i)}x^{(i)T}x^s
ys(wTxs+b)=ys(i=1∑βisy(i)x(i)Txs+b)=1⇒b∗=ys−i=1∑mβi∗y(i)x(i)Txs
y
s
,
x
s
y^s,x^s
ys,xs是已知的样本点
Polynomial (homogeneous): k ( x , x ′ ) = ( x ⋅ x ′ ) k(x,x') = (x·x') k(x,x′)=(x⋅x′)
Polynomial (inhomogeneous): k ( x , x ′ ) = ( x ⋅ x ′ + 1 ) k(x,x') = (x·x'+1) k(x,x′)=(x⋅x′+1)
Radial basis function: k ( x , x ′ ) = e x p ( − γ ∣ ∣ x − x ′ ∣ ∣ 2 ) , γ > 0 k(x,x') = exp(-\gamma ||x-x'||^2), \gamma >0 k(x,x′)=exp(−γ∣∣x−x′∣∣2),γ>0
Gaussian radial basis function: k ( x , x ′ ) = e x p ( − ∣ ∣ x − x ′ ∣ ∣ 2 2 σ 2 ) k(x,x') = exp(-\frac{||x-x'||^2}{2\sigma^2}) k(x,x′)=exp(−2σ2∣∣x−x′∣∣2)
Sigmoid: k ( x , x ′ ) = tan h ( k x ⋅ x ′ + c ) , k > 0 a n d c < 0 k(x,x') = \tan h(kx·x'+c),k>0 \quad and\quad c<0 k(x,x′)=tanh(kx⋅x′+c),k>0andc<0