参考 resnet_v1.resnet_v1() - 云+社区 - 腾讯云
- def resnet_v1(inputs,
- blocks,
- num_classes=None,
- is_training=True,
- global_pool=True,
- output_stride=None,
- include_root_block=True,
- spatial_squeeze=True,
- store_non_strided_activations=False,
- reuse=None,
- scope=None):
- """Generator for v1 ResNet models.
- This function generates a family of ResNet v1 models. See the resnet_v1_*()
- methods for specific model instantiations, obtained by selecting different
- block instantiations that produce ResNets of various depths.
- Training for image classification on Imagenet is usually done with [224, 224]
- inputs, resulting in [7, 7] feature maps at the output of the last ResNet
- block for the ResNets defined in [1] that have nominal stride equal to 32.
- However, for dense prediction tasks we advise that one uses inputs with
- spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
- this case the feature maps at the ResNet output will have spatial shape
- [(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
- and corners exactly aligned with the input image corners, which greatly
- facilitates alignment of the features to the image. Using as input [225, 225]
- images results in [8, 8] feature maps at the output of the last ResNet block.
- For dense prediction tasks, the ResNet needs to run in fully-convolutional
- (FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
- have nominal stride equal to 32 and a good choice in FCN mode is to use
- output_stride=16 in order to increase the density of the computed features at
- small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- blocks: A list of length equal to the number of ResNet blocks. Each element
- is a resnet_utils.Block object describing the units in the block.
- num_classes: Number of predicted classes for classification tasks.
- If 0 or None, we return the features before the logit layer.
- is_training: whether batch_norm layers are in training mode. If this is set
- to None, the callers can specify slim.batch_norm's is_training parameter
- from an outer slim.arg_scope.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- include_root_block: If True, include the initial convolution followed by
- max-pooling, if False excludes it.
- spatial_squeeze: if True, logits is of shape [B, C], if false logits is
- of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
- To use this parameter, the input images must be smaller than 300x300
- pixels, in which case the output logit layer does not contain spatial
- information and can be removed.
- store_non_strided_activations: If True, we compute non-strided (undecimated)
- activations at the last unit of each block and store them in the
- `outputs_collections` before subsampling them. This gives us access to
- higher resolution intermediate activations which are useful in some
- dense prediction problems but increases 4x the computation and memory cost
- at the last unit of each block.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is 0 or None,
- then net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes a non-zero integer, net contains the
- pre-softmax activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
- Raises:
- ValueError: If the target output_stride is not valid.
- """
- with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
- end_points_collection = sc.original_name_scope + '_end_points'
- with slim.arg_scope([slim.conv2d, bottleneck,
- resnet_utils.stack_blocks_dense],
- outputs_collections=end_points_collection):
- with (slim.arg_scope([slim.batch_norm], is_training=is_training)
- if is_training is not None else NoOpScope()):
- net = inputs
- if include_root_block:
- if output_stride is not None:
- if output_stride % 4 != 0:
- raise ValueError('The output_stride needs to be a multiple of 4.')
- output_stride /= 4
- net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
- net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
- net = resnet_utils.stack_blocks_dense(net, blocks, output_stride,
- store_non_strided_activations)
- # Convert end_points_collection into a dictionary of end_points.
- end_points = slim.utils.convert_collection_to_dict(
- end_points_collection)
-
- if global_pool:
- # Global average pooling.
- net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
- end_points['global_pool'] = net
- if num_classes:
- net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
- normalizer_fn=None, scope='logits')
- end_points[sc.name + '/logits'] = net
- if spatial_squeeze:
- net = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
- end_points[sc.name + '/spatial_squeeze'] = net
- end_points['predictions'] = slim.softmax(net, scope='predictions')
- return net, end_points
- resnet_v1.default_image_size = 224
生成器为v1 ResNet模型。该函数生成一系列ResNet v1模型。有关特定的模型实例化,请参见resnet_v1_*()方法,该方法通过选择产生不同深度的resnet的不同块实例化获得。Imagenet上的图像分类训练通常使用[224,224]输入,对于[1]中定义的、标称步长为32的ResNet,在最后一个ResNet块的输出处生成[7,7]feature map。然而,对于密集预测任务,我们建议使用空间维度为320 + 1的倍数的输入,例如[321,321]。在这种情况下,ResNet输出处的特征映射将具有空间形状[(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]和与输入图像角完全对齐的角,这极大地促进了特征与图像的对齐。对于密集预测任务,ResNet需要在全卷积(FCN)模式下运行,global_pool需要设置为False。[1,2]中的ResNets都有公称stride= 32,在FCN模式下,一个很好的选择是使用output_stride=16,以便在较小的计算和内存开销下增加计算特性的密度,cf. http://arxiv.org/abs/1606.00915。
参数:
返回:
可能产生的异常:
ValueError: If the target output_stride is not valid.[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Deep Residual Learning for Image Recognition. arXiv:1512.03385
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Identity Mappings in Deep Residual Networks. arXiv: 1603.05027