Caffe:常见层的书写方式

Caffe:常见层的书写方式
1、Convolution层：

就是卷积层，是卷积神经网络（CNN）的核心层。

层类型：Convolution

lr_mult: 学习率的系数，最终的学习率是这个数乘以solver.prototxt配置文件中的base_lr。如果有两个lr_mult, 则第一个表示权值的学习率，第二个表示偏置项的学习率。一般偏置项的学习率是权值学习率的两倍。

在后面的convolution_param中，我们可以设定卷积层的特有参数。

必须设置的参数：

num_output: 卷积核（filter)的个数

kernel_size: 卷积核的大小。如果卷积核的长和宽不等，需要用kernel_h和kernel_w分别设定

其它参数：

stride: 卷积核的步长，默认为1。也可以用stride_h和stride_w来设置。

pad: 扩充边缘，默认为0，不扩充。扩充的时候是左右、上下对称的，比如卷积核的大小为5*5，那么pad设置为2，则四个边缘都扩充2个像素，即宽度和高度都扩充了4个像素,这样卷积运算之后的特征图就不会变小。也可以通过pad_h和pad_w来分别设定。

weight_filler: 权值初始化。默认为“constant",值全为0，很多时候我们用"xavier"算法来进行初始化，也可以设置为”gaussian"
bias_filler: 偏置项的初始化。一般设置为"constant",值全为0。
bias_term: 是否开启偏置项，默认为true, 开启
group: 分组，默认为1组。如果大于1，我们限制卷积的连接操作在一个子集内。如果我们根据图像的通道来分组，那么第i个输出分组只能与第i个输入分组进行连接。

输入：n * c0 * w0 * h0
输出：n * c1 * w1 * h1
其中，c1就是参数中的num_output，生成的特征图个数
w1=(w0+2 * pad-kernel_size)/stride+1;
h1=(h0+2 * pad-kernel_size)/stride+1;
如果设置stride为1，前后两次卷积部分存在重叠。如果设置pad=(kernel_size-1)/2,则运算后，宽度和高度不变。

2 BatchNorm

caffeproto 中 BatchNorm 的定义
```
message LayerParameter {
    optional BatchNormParameter batch_norm_param = 139;
}

message BatchNormParameter {
  // If false, normalization is performed over the current mini-batch
  // and global statistics are accumulated (but not yet used) by a moving
  // average.
  // 如果 use_global_stats = 0，则对当前 mini-batch 内的数据归一化； 同时 global statistics 通过滑动平均逐渐累加.
  // If true, those accumulated mean and variance values are used for the
  // normalization.
  // 如果 use_global_stats = 1，则采用累加的 均值和方差 对数据进行归一化.
  // By default, it is set to false when the network is in the training
  // phase and true when the network is in the testing phase.
  // 默认情况下，网络训练时 use_global_stats = 0；网络测试时 use_global_stats = 1. 
  optional bool use_global_stats = 1;

  // What fraction of the moving average remains each iteration?
  // 滑动平均时每次迭代保留的百分比？
  // Smaller values make the moving average decay faster, giving more
  // weight to the recent values.
  // 较小的值使得平均累加过程衰退较快，给予最近的值较大的权重
  // Each iteration updates the moving average @f$S_{t- 1}@f$ with the
  // current mean @f$ Y_t @f$ by 
  // @f$ S_t = (1-\beta)Y_t + \beta \cdot S_{t-1} @f$, where @f$ \beta @f$
  // is the moving_average_fraction parameter.
  optional float moving_average_fraction = 2 [default = .999];
  // Small value to add to the variance estimate so that we don't divide by
  // zero.
  // 保持数值稳定性
  optional float eps = 3 [default = 1e-5];
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
```
filter类型

先上proto上关于FillerParameter的数据结构描述，对应的Caffe源码的头文件为filler.hpp。
```
message FillerParameter {
  // The filler type.
  optional string type = 1 [default = 'constant'];
  optional float value = 2 [default = 0]; // the value in constant filler
  optional float min = 3 [default = 0]; // the min value in uniform filler
  optional float max = 4 [default = 1]; // the max value in uniform filler
  optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
  optional float std = 6 [default = 1]; // the std value in Gaussian filler
  // The expected number of non-zero output weights for a given input in
  // Gaussian filler -- the default -1 means don't perform sparsification.
  optional int32 sparse = 7 [default = -1];
  // Normalize the filler variance by fan_in, fan_out, or their average.
  // Applies to 'xavier' and 'msra' fillers.
  enum VarianceNorm {
    FAN_IN = 0;
    FAN_OUT = 1;
    AVERAGE = 2;
  }
  optional VarianceNorm variance_norm = 8 [default = FAN_IN];
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
```
从上面我们可以看到有关的filler参数及其默认值。如默认filler填充类型为constant，value默认值为0，xavier和msra的VarianceNorm类型默认为FAN_IN等等。重点注意下sparse这个参数，它标志着初始化的数据有没有稀疏性。

Constant

作用：默认将Blob系数x初始化为0。如果定义了value的值value = a，则x = a。（不支持sparse）

Gaussian

作用：默认将Blob系数x初始化为满足mean=0，std=1的高斯分布x∼N(mean,std2)x∼N(mean,std2)。mean和std的值可自定义，支持sparse。

Positive_unitball

作用：默认将Blob系数x初始化为满足x∈[0,1]x∈[0,1]，∀i∑jxij=1∀i∑jxij=1.（不支持sparse）

Uniform

作用：默认将Blob系数x初始化为满足min=0,max=1的均匀分布。x∼U(min,max)x∼U(min,max)（不支持sparse）

Xavier（不适用于inner product layers.）

作用：默认将Blob系数x初始化为满足x∼U(−a,+a)x∼U(−a,+a)的均匀分布，其中 aa = sqrt(3 / n)。（不支持sparse）
假设输入blob的shape为（num, a, b, c）。对于n的取值，下面分三种情况：

FAN_IN: 默认为这种类型。该类型下，n = a * b * c
FAN_OUT: n = num * b * c
AVERAGE: n = ( FAN_IN + FAN_OUT )/2
参考paper [Bengio and Glorot 2010]: Understanding the difficulty of training deep feedforward neural networks.

Msra（不适用于inner product layers.）

作用：默认将Blob系数x初始化为满足x∼N(0,σ2)x∼N(0,σ2)的高斯分布，其中σσ =sqrt(2 / n)。和Xavier一样，对于n的取值分为三种类型：

FAN_IN: 默认为这种类型。该类型下，n = a * b * c
FAN_OUT: n = num * b * c
AVERAGE: n = ( FAN_IN + FAN_OUT )/2
参考paper [He, Zhang, Ren and Sun 2015]: Specifically accounts for ReLU nonlinearities.

Bilinear

作用：一般用在deconvolution 层做upsampling，例子如下：
```
layer {
  name: "upsample", type: "Deconvolution"
  bottom: "{{bottom_name}}" top: "{{top_name}}"
  convolution_param {
    kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
    num_output: {{C}} group: {{C}}
    pad: {{ceil((factor - 1) / 2.)}}
    weight_filler: { type: "bilinear" } bias_term: false
  }
  param { lr_mult: 0 decay_mult: 0 }
}
1
2
3
4
5
6
7
8
9
10
11
```
参考地址：
1. https://blog.csdn.net/zziahgf/article/details/78843350
2. https://blog.csdn.net/wenlin33/article/details/53378613
相关阅读:
为什么用葫芦儿派盘取代U盘？
JVM之类加载器
 【ARM Coresight SoC-400/SoC-600 专栏导读】
强化学习之Dueling DQN对DQN的改进——以倒立摆环境（Inverted Pendulum）为例
 JDK与JRE的关系
 2022-08-03
【一】Mac 本地部署大模型
 《Kubernetes生产级实践指南》课程手记-FAQ
指针拔尖1——（看完包会，不会来打我）
ElasticSearch（十一）【集群搭建】
原文地址：https://blog.csdn.net/Felaim/article/details/104395033

1、Convolution层：

2 BatchNorm

filter类型

Constant

Gaussian

Positive_unitball

Uniform

Xavier（不适用于inner product layers.）

Msra（不适用于inner product layers.）

Bilinear