前提说明:
理想情况下,对于同一个权重而言,希望每channel权重的range同整个权重的range相等。
因此提出如下的优化目标:
p
~
i
(
1
)
=
r
~
i
(
1
)
R
~
(
1
)
max
S
∑
i
p
~
i
(
1
)
p
~
i
(
2
)
解优化目标:
推导优化目标(基于(13)公式进行推导)
r
~
(
1
)
=
S
−
1
r
(
1
)
r
~
(
2
)
=
r
(
2
)
S
R
~
(
k
)
=
max
i
(
r
~
i
(
k
)
)
max
S
∑
i
p
~
i
(
1
)
p
~
i
(
2
)
=
max
S
∑
i
r
~
i
(
1
)
r
~
i
(
2
)
R
~
(
1
)
R
~
(
2
)
=
max
S
∑
i
1
s
i
r
i
(
1
)
s
i
r
i
(
2
)
max
j
(
1
s
j
r
j
(
1
)
)
max
k
(
s
k
r
k
(
2
)
)
=
∑
i
r
i
(
1
)
r
i
(
2
)
max
S
1
max
j
(
1
s
j
r
j
(
1
)
)
max
k
(
s
k
r
k
(
2
)
)
优化目标简化(基于(19)公式)
arg
max
S
∑
i
p
~
i
(
1
)
p
~
i
(
2
)
=
arg
min
S
[
max
j
(
1
s
j
r
j
(
1
)
)
max
k
(
s
k
r
k
(
2
)
)
]
利用反证法,对于优化目标,可以推出如下的结论(26):
J
=
arg
max
j
(
1
s
j
r
j
(
1
)
)
K
=
arg
max
k
(
s
k
r
k
(
2
)
)
s
~
K
=
s
K
−
ε
1
s
J
r
J
(
1
)
>
1
s
~
K
r
K
(
1
)
>
1
s
K
r
K
(
1
)
1
s
J
r
J
(
1
)
s
K
r
K
(
2
)
>
1
s
J
r
J
(
1
)
s
~
K
r
K
(
2
)
arg
max
j
(
1
s
j
r
j
(
1
)
)
=
arg
max
k
(
s
k
r
k
(
2
)
)
利用(26)的结论,如果我们再进一步简化优化目标,最后会发现优化目标变成一个跟
S
S
S无关的解,而是依赖于
i
=
arg
max
i
(
r
i
(
1
)
r
i
(
2
)
)
i=\mathop{\arg\max}\limits_{i}(r^{(1)}_{i} r^{(2)}_{i})
i=iargmax(ri(1)ri(2))。因此这里为了能够解得所有的
S
S
S,这里增加了如下所示的条件限制:
∀
i
:
r
~
i
(
1
)
=
r
~
i
(
2
)
因此结合(26)和(27),我们可以得到满足条件的解如下:
s
i
=
1
r
i
(
2
)
r
i
(
1
)
r
i
(
2
)
从上述的优化过程中,我们可以看出,CLE存在一些缺陷和使用限制
import numpy as np
def my_cle(pre_layer_weight, pre_layer_bias, cur_layer_weight):
# 自建的cle函数,用于计算和
channel_num = list(pre_layer_weight.shape)[0]
r1 = np.array([np.abs(pre_layer_weight[i]).max() for i in range(channel_num)]).tolist()
r2 = np.array([np.abs(cur_layer_weight[i]).max() for i in range(channel_num)]).tolist()
scale = list()
for a,b in zip(r1, r2):
mul_v = a * b
if mul_v == 0.0:
scale.append(1.0)
else:
scale.append(math.sqrt(a/b))
scale = np.array(scale, dtype=np.float32)
pre_layer_weight_res = [pre_layer_weight[i]/s for i, s in enumerate(scale)]
cur_layer_weight_res = [cur_layer_weight[i]*s for i, s in enumerate(scale)]
pre_layer_bias_res = [pre_layer_bias[i]/s for i, s in enumerate(scale)]
pre_layer_weight_res = np.array(pre_layer_weight_res)
cur_layer_weight_res = np.array(cur_layer_weight_res)
pre_layer_bias_res = np.array(pre_layer_bias_res)
return scale, pre_layer_weight_res, pre_layer_bias_res, cur_layer_weight_res
import torch
class onnx2torch_cle(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5,
stride=1, padding=2)
self.relu1 = torch.nn.ReLU()
self.conv2 = torch.nn.Conv2d(in_channels=16, out_channels=4, kernel_size=5,
stride=1, padding=2)
def forward(self, x):
x = self.conv(x)
x = self.relu1(x)
x = self.conv2(x)
return x
def test_cle():
model = onnx2torch_cle()
weight_all(model)
model = model.eval()
model = model.cpu()
model_aimet = copy.deepcopy(model)
input_shape = [1, 3, 264, 264]
dummy_input = torch.rand(input_shape)
dummy_output = model(dummy_input)
# 可视化权重
output_png_root = os.path.join(os.path.dirname(__file__), 'doc')
if not os.path.exists(output_png_root):
os.mkdir(output_png_root)
file_name = 'org_pre_layer_w'
show_weight(model.conv.weight.detach().numpy(), file_name, os.path.join(output_png_root, '{}.png'.format(file_name)))
file_name = 'org_cur_layer_w'
show_weight(model.conv2.weight.detach().numpy().transpose(1,0,2,3), file_name, os.path.join(output_png_root, '{}.png'.format(file_name)))
# cle微调权重
scale, pre_layer_weight_res, pre_layer_bias_res, cur_layer_weight_res = my_cle(model.conv.weight.detach().numpy(),
model.conv.bias.detach().numpy(),
model.conv2.weight.detach().numpy().transpose(1,0,2,3))
cur_layer_weight_res = torch.from_numpy(cur_layer_weight_res.transpose(1,0,2,3))
pre_layer_weight_res = torch.from_numpy(pre_layer_weight_res)
pre_layer_bias_res = torch.from_numpy(pre_layer_bias_res)
model.conv.weight.data = pre_layer_weight_res
model.conv.bias.data = pre_layer_bias_res
model.conv2.weight.data = cur_layer_weight_res
dummy_output_cle = model(dummy_input)
a,b = compare_data(dummy_output.detach().numpy(), dummy_output_cle.detach().numpy())
print(scale)
print(a,b)
file_name = 'cle_pre_layer_w'
show_weight(model.conv.weight.detach().numpy(), file_name, os.path.join(output_png_root, '{}.png'.format(file_name)))
file_name = 'cle_cur_layer_w'
show_weight(model.conv2.weight.detach().numpy().transpose(1,0,2,3), file_name, os.path.join(output_png_root, '{}.png'.format(file_name)))
if __name__ == "__main__":
test_cle()
从compare的结果可以看出,cle前后的模型在同个输入下的输出是基本保持一致的(余弦相似度=1.0、相对最大误差=3.6033464e-07)。
下述的可视化结果如下,我们可以看到:
原始模型的权重分布如下:
model.conv.weight
model.conv2.weight
cle后模型的权重分布如下:
model.conv.weight
model.conv2.weight
可见下载链接