看到同事sql中使用了夏明这个表达式来计算一个点击率特征
(
c
l
i
c
k
_
c
n
t
_
i
n
_
30
d
s
h
o
w
_
c
n
t
_
i
n
_
30
d
+
1.9
6
2
2
∗
s
h
o
w
_
c
n
t
_
i
n
_
30
d
)
/
(
1
+
1.9
6
2
s
h
o
w
_
c
n
t
_
i
n
_
30
d
)
−
1.96
1
+
1.9
6
2
s
h
o
w
_
c
n
t
_
i
n
_
30
d
∗
c
l
i
c
k
_
c
n
t
_
i
n
_
30
d
s
h
o
w
_
c
n
t
_
i
n
_
30
d
∗
1
−
c
l
i
c
k
_
c
n
t
_
i
n
_
30
d
s
h
o
w
_
c
n
t
_
i
n
_
30
d
s
h
o
w
_
c
n
t
_
i
n
_
30
d
+
1.9
6
2
4
∗
s
h
o
w
_
c
n
t
_
i
n
_
30
d
2
( \frac{click\_cnt\_in\_30d}{show\_cnt\_in\_30d} + \frac{1.96^2}{2 * show\_cnt\_in\_30d} ) /(1 + \frac{1.96^2}{show\_cnt\_in\_30d}) -\frac{1.96}{1 + \frac{1.96^2}{ show\_cnt\_in\_30d}} * \sqrt{ \frac{click\_cnt\_in\_30d}{show\_cnt\_in\_30d} *\frac{1 - \frac{click\_cnt\_in\_30d}{show\_cnt\_in\_30d}}{show\_cnt\_in\_30d}+ \frac{1.96^2}{4 * show\_cnt\_in\_30d^2} }
(show_cnt_in_30dclick_cnt_in_30d+2∗show_cnt_in_30d1.962)/(1+show_cnt_in_30d1.962)−1+show_cnt_in_30d1.9621.96∗show_cnt_in_30dclick_cnt_in_30d∗show_cnt_in_30d1−show_cnt_in_30dclick_cnt_in_30d+4∗show_cnt_in_30d21.962
优化下表达
c
l
i
c
k
30
d
s
h
o
w
30
d
+
1.9
6
2
2
∗
s
h
o
w
30
d
1
+
1.9
6
2
s
h
o
w
30
d
−
1.96
1
+
1.9
6
2
s
h
o
w
30
d
∗
c
l
i
c
k
30
d
s
h
o
w
30
d
∗
(
1
−
c
l
i
c
k
30
d
s
h
o
w
30
d
)
s
h
o
w
30
d
+
1.9
6
2
4
∗
s
h
o
w
30
d
2
\frac{ \frac{click_{30d}}{show_{30d}} + \frac{1.96^2}{2 * show_{30d}} }{1 + \frac{1.96^2}{show_{30d}}} -\frac{1.96}{1 + \frac{1.96^2}{ show_{30d}}} * \sqrt{ \frac{\frac{click_{30d}}{show_{30d}}*\left(1 - \frac{click_{30d}}{show_{30d}}\right)}{show_{30d}}+ \frac{1.96^2}{4 * show_{30d}^2} }
1+show30d1.962show30dclick30d+2∗show30d1.962−1+show30d1.9621.96∗show30dshow30dclick30d∗(1−show30dclick30d)+4∗show30d21.962
设
p
^
=
c
l
i
c
k
s
h
o
w
\hat{p} = \frac{click}{show}
p^=showclick
n
=
s
h
o
w
n = show
n=show
z
=
1.96
z = 1.96
z=1.96
可以得到
p ^ + z 2 2 ∗ n 1 + z 2 n − z 1 + z 2 n ∗ p ^ ∗ ( 1 − p ^ ) n + z 2 4 ∗ n 2 \frac{ \hat{p} + \frac{z^2}{2 * n} }{1 + \frac{z^2}{n}} -\frac{z}{1 + \frac{z^2}{ n}} * \sqrt{ \frac{\hat{p}*\left(1 - \hat{p}\right)}{n}+ \frac{z^2}{4 * n^2} } 1+nz2p^+2∗nz2−1+nz2z∗np^∗(1−p^)+4∗n2z2
所谓的威尔逊平滑的下界值
参考:
https://blog.csdn.net/hero_myself/article/details/116264111