训练数据集
$T={ (x_1,y_1),(x_2,y_2),…,(x_N,y_N)} $
当
y
i
=
+
1
y_i=+1
yi=+1时,称
x
i
x_i
xi为正例;当
y
i
=
−
1
y_i=-1
yi=−1时,称
x
i
x_i
xi为负例,
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)称为样本点。
线性可分⽀持向量机(硬间隔⽀持向量机):给定线性可分训练数据集,通过间隔最⼤化或等价地求解相应地凸⼆次规划问题学习得到分离超平⾯为
w
∗
×
x
+
b
∗
=
0
w^* × x + b^* = 0
w∗×x+b∗=0
以及相应的分类决策函数:
f
(
x
)
=
s
i
g
n
(
s
∗
×
x
+
b
∗
)
f(x)=sign(s^* × x + b^*)
f(x)=sign(s∗×x+b∗)
称为线型可分支持向量机。其中
w
∗
和
b
∗
w^*和b^*
w∗和b∗为感知机模型参数。
w
∗
w^*
w∗叫做权值,
b
∗
b^*
b∗叫作偏置。
超平面
(
w
,
b
)
(w,b)
(w,b)关于样本点
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)的函数间隔为:
γ
i
∗
=
y
i
(
w
×
x
i
+
b
)
γ_i^* = y_i(w×x_i+b)
γi∗=yi(w×xi+b)
超平面
(
w
,
b
)
(w,b)
(w,b) 关于训练集
T
T
T的函数间隔:
γ
∗
=
m
i
n
(
γ
i
)
,
i
=
1
,
2
,
.
.
,
N
γ^*=min(γ_i ) ,i=1,2,..,N
γ∗=min(γi),i=1,2,..,N
即超平面
(
w
,
b
)
(w,b)
(w,b)关于训练集
T
T
T中所有样本点
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)的函数间隔的最小值。
超平面
(
w
,
b
)
(w,b)
(w,b)关于样本点
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)的几何间隔为
γ
i
=
y
i
(
(
w
×
x
i
+
b
)
/
∣
∣
w
∣
∣
)
γ_i = y_i((w×x_i + b) /||w||)
γi=yi((w×xi+b)/∣∣w∣∣)
超平面
(
w
,
b
)
(w,b)
(w,b)关于训练集
T
T
T的几何间隔:
γ
=
m
i
n
(
γ
i
)
,
i
=
1
,
2
,
.
.
.
,
N
γ=min(γ_i),i=1,2,...,N
γ=min(γi),i=1,2,...,N
即超平面
(
w
,
b
)
(w,b)
(w,b)关于训练集
T
T
T中所有样本点
(
x
i
,
y
i
)
(x_i,y_i)
(xi,yi)的几何间隔的最小值。
函数间隔和几何间隔的关系:
γ
i
=
γ
i
∗
/
∣
∣
w
∣
∣
γ_i = γ_i^* / ||w||
γi=γi∗/∣∣w∣∣
γ
=
γ
∗
/
∣
∣
w
∣
∣
γ = γ^* / ||w||
γ=γ∗/∣∣w∣∣


(硬间隔)支持向量:训练数据集的样本点中与分离超平面距离最近的样本点的实例,即使约束条件等号成立的样本点
y
i
(
w
×
x
i
+
b
)
−
1
=
0
y_i(w×x_i + b)-1=0
yi(w×xi+b)−1=0
对
y
i
=
+
1
y_i=+1
yi=+1的正例点,支持向量在超平面
H
1
:
w
×
x
+
b
=
1
H_1:w×x+b=1
H1:w×x+b=1
对
y
i
=
−
1
y_i=-1
yi=−1的正例点,支持向量在超平面
H
2
:
w
×
x
+
b
=
−
1
H_2:w×x+b=-1
H2:w×x+b=−1
H
1
,
H
2
H_1,H_2
H1,H2称为间隔边界。
H
1
和
H
2
之
间
的
距
离
称
为
间
隔
,
且
∣
H
1
H
2
∣
=
(
1
/
∣
∣
w
∣
∣
)
+
(
1
/
∣
∣
w
∣
∣
)
=
2
/
∣
∣
w
∣
∣
)
H_1和H_2之间的距离称为间隔,且|H_1H_2|=(1/||w||)+(1/||w||)=2/||w||)
H1和H2之间的距离称为间隔,且∣H1H2∣=(1/∣∣w∣∣)+(1/∣∣w∣∣)=2/∣∣w∣∣)
最优化问题的求解,使用拉格朗日法求解。
线性支持向量机(软间隔支持向量机):给定线性不可分训练数据集,通过求解凸二次规划问题。

学习得到分离超平面为
w
∗
×
x
+
b
∗
=
0
w^* × x + b^* = 0
w∗×x+b∗=0
以及相应的分类决策函数
f
(
x
)
=
s
i
g
n
(
w
∗
×
x
+
b
∗
)
f(x)=sign(w^* × x + b^*)
f(x)=sign(w∗×x+b∗)
称为线性支持向量机。

SVC的实现-sklearn库调用方式及参数解释
(1)LinearSVC实现了线性分类支持向量机
from sklearn.svm import LinearSVC
LinearSVC(penalty='l2',loss='squared_hinge',dual=True,tol=0.0001,C=1.0,multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None,verbose=0,random_state=None, max_iter=1000)
参数:
属性:
方法
(2)SVC实现了非线性分类支持向量机
from sklearn.svm import SVC
SVC(C=1.0, kernel='rbf', degree=3, gamma='auto_deprecated', coef0=0.0,shrinking=True,probability=False, tol=0.001, cache_size=200, class_weight=None,verbose=False,max_iter=-1, decision_function_shape='ovr', random_state=None)
参数:
属性:
方法:
(3)svm.LinearSVR
from sklearn.svm import LinearSVR
LinearSVR(epsilon=0.0, tol=0.0001, C=1.0, loss='epsilon_insensitive',fit_intercept=True,intercept_scaling=1.0, dual=True, verbose=0,random_state=None, max_iter=1000)
参数:
属性:
方法:
from sklearn.svm import SVR
SVR(kernel='rbf', degree=3, gamma='auto', coef0=0.0, tol=0.001, C=1.0,epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1)
参数:
属性:
方法:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets,linear_model,model_selection,svm
def load_data_classfication():
iris = datasets.load_iris()
x_train = iris.data
y_train = iris.target
return model_selection.train_test_split(x_train,y_train,test_size=0.2,random_state=0,stratify=y_train)
def test_SVC_linear(*data):
x_train,x_test,y_train,y_test = data
cls = svm.SVC(kernel='linear')
cls.fit(x_train,y_train)
print('Coefficients:%s.intercept %s'%(cls.coef_,cls.intercept_))
print('Score:%.2f'%cls.score(x_test,y_test))
def test_SVC_poly(*data):
x_train,x_test,y_train,y_test = data
fig = plt.figure()
degrees = range(1,20)
train_scores = []
test_scores = []
for degree in degrees:
cls = svm.SVC(kernel='poly',degree=degree)
cls.fit(x_train,y_train)
train_scores.append(cls.score(x_train,y_train))
test_scores.append(cls.score(x_test,y_test))
ax = fig.add_subplot(1,3,1)
ax.plot(degrees,train_scores,label='Training score ',marker='+')
ax.plot(degrees,test_scores,label='Testing score ',marker='o')
ax.set_title('SVC_poly_degree')
ax.set_xlabel('p')
ax.set_ylabel('score')
ax.set_ylim(0,1.05)
ax.legend(loc='best',framealpha=0.5)
gammas = range(1,20)
train_scores = []
test_scores = []
for gamma in gammas:
cls = svm.SVC(kernel='poly',gamma=gamma,degree=3)
cls.fit(x_train,y_train)
train_scores.append(cls.score(x_train,y_train))
test_scores.append(cls.score(x_test,y_test))
ax = fig.add_subplot(1,3,2)
ax.plot(gammas,train_scores,label="Training score ",marker='+' )
ax.plot(gammas,test_scores,label= " Testing score ",marker='o' )
ax.set_title( "SVC_poly_gamma ")
ax.set_xlabel(r"$\gamma$")
ax.set_ylabel("score")
ax.set_ylim(0,1.05)
ax.legend(loc="best",framealpha=0.5)
rs=range(0,20)
train_scores=[]
test_scores=[]
for r in rs:
cls=svm.SVC(kernel='poly',gamma=10,degree=3,coef0=r)
cls.fit(X_train,y_train)
train_scores.append(cls.score(X_train,y_train))
test_scores.append(cls.score(X_test, y_test))
ax=fig.add_subplot(1,3,3)
ax.plot(rs,train_scores,label="Training score ",marker='+' )
ax.plot(rs,test_scores,label= " Testing score ",marker='o' )
ax.set_title( "SVC_poly_r ")
ax.set_xlabel(r"r")
ax.set_ylabel("score")
ax.set_ylim(0,1.05)
ax.legend(loc="best",framealpha=0.5)
plt.show()
def test_SVC_rbf(*data):
X_train,X_test,y_train,y_test=data
gammas=range(1,20)
train_scores=[]
test_scores=[]
for gamma in gammas:
cls=svm.SVC(kernel='rbf',gamma=gamma)
cls.fit(X_train,y_train)
train_scores.append(cls.score(X_train,y_train))
test_scores.append(cls.score(X_test, y_test))
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.plot(gammas,train_scores,label="Training score ",marker='+' )
ax.plot(gammas,test_scores,label= " Testing score ",marker='o' )
ax.set_title( "SVC_rbf")
ax.set_xlabel(r"$\gamma$")
ax.set_ylabel("score")
ax.set_ylim(0,1.05)
ax.legend(loc="best",framealpha=0.5)
plt.show()
def test_SVC_sigmoid(*data):
X_train,X_test,y_train,y_test=data
fig=plt.figure()
gammas=np.logspace(-2,1)
train_scores=[]
test_scores=[]
for gamma in gammas:
cls=svm.SVC(kernel='sigmoid',gamma=gamma,coef0=0)
cls.fit(X_train,y_train)
train_scores.append(cls.score(X_train,y_train))
test_scores.append(cls.score(X_test, y_test))
ax=fig.add_subplot(1,2,1)
ax.plot(gammas,train_scores,label="Training score ",marker='+' )
ax.plot(gammas,test_scores,label= " Testing score ",marker='o' )
ax.set_title( "SVC_sigmoid_gamma ")
ax.set_xscale("log")
ax.set_xlabel(r"$\gamma$")
ax.set_ylabel("score")
ax.set_ylim(0,1.05)
ax.legend(loc="best",framealpha=0.5)
rs=np.linspace(0,5)
train_scores=[]
test_scores=[]
for r in rs:
cls=svm.SVC(kernel='sigmoid',coef0=r,gamma=0.01)
cls.fit(X_train,y_train)
train_scores.append(cls.score(X_train,y_train))
test_scores.append(cls.score(X_test, y_test))
ax=fig.add_subplot(1,2,2)
ax.plot(rs,train_scores,label="Training score ",marker='+' )
ax.plot(rs,test_scores,label= " Testing score ",marker='o' )
ax.set_title( "SVC_sigmoid_r ")
ax.set_xlabel(r"r")
ax.set_ylabel("score")
ax.set_ylim(0,1.05)
ax.legend(loc="best",framealpha=0.5)
plt.show()
if __name__ == "__main__":
X_train,X_test,y_train,y_test=load_data_classfication()
test_SVC_linear(X_train,X_test,y_train,y_test)
test_SVC_poly(X_train,X_test,y_train,y_test)
test_SVC_rbf(X_train,X_test,y_train,y_test)
test_SVC_sigmoid(X_train,X_test,y_train,y_test)