Model:
y
i
=
F
(
x
i
)
=
w
x
i
y_i = F(x_i)=wx_i
yi=F(xi)=wxi
Optimization: squared error
min
w
∑
i
=
1
n
(
y
i
−
w
x
i
)
2
\min_w \sum_{i=1}^n (y_i - wx_i)^2
wmini=1∑n(yi−wxi)2
Evaluation
②Decision Tree
Model
Example 其中,
c
1
,
c
2
c_1, c_2
c1,c2分别是按照划分点s划分后两个类别的样本在所划分属性上各自的均值。
③Nearest-Neighbor Classifiers
Distance
Classification
④Logistic regression
Optimization
P
(
y
=
1
∣
x
,
w
)
P
(
y
=
0
∣
x
,
w
)
=
1
1
−
P
(
y
=
1
∣
x
,
w
)
−
1
\frac{P(y=1|x,w)}{P(y=0|x,w)} = \frac{1}{1-P(y=1|x,w)}-1
P(y=0∣x,w)P(y=1∣x,w)=1−P(y=1∣x,w)1−1
(3)Data ①Distribution
Testing and Training Dataset
Observed and Real Dataset (hard to control)
②Sample complexity
其中,
ϵ
\epsilon
ϵ可以理解为精度、
1
−
δ
1-\delta
1−δ理解为置信度、
V
S
H
,
D
VS_{H,D}
VSH,D是所有满足精度与置信度条件
ϵ
−
e
x
h
a
u
s
t
e
d
\epsilon - exhausted
ϵ−exhausted的模型集合。当置信度越高、精度越高时,所需要的训练样本数量
m
m
m增加。