Segements + Link:可得到最后的文本行的 box
(
x
,
y
,
w
,
h
,
θ
)
(x, y, w, h, \theta)
(x,y,w,h,θ);
Loss 函数:
L
(
y
s
,
c
s
,
y
l
,
c
l
,
s
^
,
s
)
=
1
N
s
L
c
o
n
f
(
y
s
,
c
s
)
+
λ
1
1
N
s
L
l
o
c
(
s
^
,
s
)
+
λ
2
1
N
i
L
c
o
n
f
(
y
l
,
c
l
)
L(y_s, c_s, y_l, c_l, \hat{s}, s ) = \frac{1}{N_s} L_{conf(y_s, c_s)} + \lambda_1 \frac{1}{N_s}L_{loc}(\hat{s}, s) + \lambda_2 \frac{1}{N_i}L_{conf}(y_l, c_l)
L(ys,cs,yl,cl,s^,s)=Ns1Lconf(ys,cs)+λ1Ns1Lloc(s^,s)+λ2Ni1Lconf(yl,cl).
损失函数:
L
p
i
x
e
l
=
1
(
1
+
r
)
S
W
L
p
i
x
e
l
_
C
E
L
l
i
n
k
_
p
o
s
=
W
p
o
s
_
l
i
n
k
L
l
i
n
k
_
C
E
L
l
i
n
k
_
n
e
g
=
W
n
e
g
_
l
i
n
k
L
l
i
n
k
_
C
E
L
l
i
n
k
=
L
l
i
n
k
_
p
o
s
(
r
s
u
m
(
W
p
o
s
_
l
i
n
k
)
+
L
l
i
n
k
_
n
e
g
(
r
s
u
m
(
W
n
e
g
_
l
i
n
k
)
L_{pixel} = \frac{1}{(1+r)S} WL_{pixel\_CE} \\ L_{link\_pos} = W_{pos\_link} L_{link\_CE}\\ L_{link\_neg} = W_{neg\_link} L_{link\_CE}\\ L_{link} = \frac{L_{link\_pos}}{(rsum(W_{pos\_link})} + \frac{L_{link\_neg}}{(rsum(W_{neg\_link})}
Lpixel=(1+r)S1WLpixel_CELlink_pos=Wpos_linkLlink_CELlink_neg=Wneg_linkLlink_CELlink=(rsum(Wpos_link)Llink_pos+(rsum(Wneg_link)Llink_neg
8、Textboxes
改进的 SSD 算法:
端到端训练;
检测 + OCR;
网络结构:
主干网络:VGG + 6个卷积层特征层;
Text-box layer:
预测 72 维向量;
12 个 default boxes;
预测 4个偏差坐标值。
NMS;
Default box 的长宽比改为 1,2,3,5,7,10;
1×5 滤波器代替 3×3 滤波器;
仅包含文本行一种分类;
多个比例图片输入;
Loss:
L
(
x
,
c
,
l
,
g
)
=
1
N
(
L
c
o
n
f
(
x
,
c
)
+
α
L
l
o
c
(
x
,
l
,
g
)
)
L(x, c, l, g) = \frac{1}{N}(L_{conf}(x, c) + \alpha L_{loc} (x, l, g))
L(x,c,l,g)=N1(Lconf(x,c)+αLloc(x,l,g));