introduction
Hard margin classifcation
strictly impose that all instances be off the street
two main issues ——
Soft Margin Classifcation
允许部分异常值出现在street里面,The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations
用C hyperparameter控制street的宽度 ——
三种实现API ——
Polynomial Kernel
面对非线性数据的时候,第一种办法是添加多项式使得数据线性可分,但是多项式degree需要考虑,如果过高,则特征数量巨大,训练很慢;如果太小,没办法处理复杂的数据集
the kernel trick 核函数,可以模拟出多项式的效果,without actually having to add them —— SVC(kernel=“poly”, degree=3, coef0=1, C=5), 【degree,多项式的阶; coef0 controls how much the model is influenced by highdegree polynomials versus low-degree polynomials,C controls the width of street,C越大对street要求越严格】
Gaussian RBF Kernel
处理非线性数据,另一种方法是Adding Similarity Features,坐标系转换 —— 选择landmark,将每个点的坐标映射到与这个landmark的相似关系(a similarity function)中去,RBF就是一个这样的点X围绕点l转换的公式
G a u s s i a n R B F ϕ γ ( X , l ) = e − γ ∣ ∣ X − l ∣ ∣ 2 Gaussian \; RBF \; \phi\gamma(X, l) = e^{-\gamma||X-l||^2} GaussianRBFϕγ(X,l)=e−γ∣∣X−l∣∣2
可能将x_1巧妙的转化为x_2,x_3,的坐标系,然后线性可分
The simplest approach is to create a landmark at the location of each and every instance,将数据从非线性的X(m, n) 转成 线性的X(m, m)
SVC(kernel=“rbf”, gamma=5, C=0.001)
超参数1 —— gamma (γ),作为指数项里面的一个超参数,控制决策边界的regular程度,gamma越大,指数值变化越快,钟型曲线越陡峭,拟合程度越高,偏差越小,方差越大
超参数2 —— C,同上述,控制street的宽度,C越大,street越窄,模型偏差越小,方差越大
C越小,street越宽,模型偏差越大,方差越小
Computational Complexity
Class | 时间复杂度 | 超大数据量 | 特征压缩处理 | 核函数 |
---|---|---|---|---|
LinearSVC | O(m*n) | No | Yes | No |
SGDClassifier | O(m*n) | Yes | Yes | No |
SVC | O(m*m*n) to O(m*m*m*n) | No | Yes | Yes |
Decision Function and Predictions
1)新约定,the bias term will be called b,the feature weights vector will be called w,No bias feature x_0
2)几个超平面
Decision function —— 决策函数是一个n+1维的超平面
Decision boundary —— 决策边界是当决策函数值为0时的一个n维的超平面,the set of points where the decision function is equal to 0
Margin boundary —— street的边界是 the decision function is equal to 1 or –1的超平面,永远和决策边界平行
3)Linear SVM classifer
||w||决定了street的宽度,当||w||越大的时候,street的宽度越小
y ^ = { 0 i f w T x + b < 0 1 i f w T x + b ≥ 0 \hat{y} =
Training Objective
1)Hard margin
目标是最大化street宽度,也就是最小化||w||
define t(i) = –1 for negative instances (if y(i) = 0) and t(i) = 1 for positive instances (if y(i) = 1)
2)Soft margin
同时权衡最大化边界 和 允许部分实例落入边界
ζ表示可以出现在street内的概率 —— define ζ(i) measures how much the i instance is allowed to violate the margin
超参C