config设置训练参数 - image_resizer

使用TF框架，不可避免的会使用到configs目录下的config文件。若我们想用其对自己的数据集进行训练，则对config中每个参数的设置都应有所了解。

本文主要是对 image_resizer 参数的设置进行记录，便于以后查看。

../object_detection/protos目录是模型参数可配置数值声明文件的合集，找到image_resizer_pb2.py和image_resizer.proto文件，其中image_resizer_pb2.py是由后者编译生成的。在image_resizer.proto文件中声明了image_resizer的5个方法：
1）KeepAspectRatioResizer keep_aspect_ratio_resizer=1；
2）FixedShapeResizer fixed_shape_resizer=2；
3）IdentityResizer identity_resizer=3；
4）ConditionalShapeResizer conditional_shape_resizer=4；
5）PadToMultipleResizer pad_to_multiple_resizer=5.

在../object_detection/builders/image_resizer_builder.py文件中可以查看各个方法的实现：

方法一：KeepAspectRatioResizer

通过image_resizer_builder.py文件可以发现，调用的是../object_detection/core/processor.py中的resize_to_range方法。详细参数如下（冒号后面为默认值）：

a. min_dimension：600
b. max_dimension：1024
c. resize_method：BILINEAR
d. pad_to_max_dimension：false
e. convert_to_grayscale：false
f. per_channel_pad_value：当d设置为True时，每个通道进行填充时的值，一般设置为：(int(x), int(y), int(z)).

详解：最终使用的是tf.image.resize_images(image, tf.stack([min_dimension, max_dimension]), method=method)方法，在保证图像尺寸比例不变的情况下，使用设定的resize_method方法，将图像范围resize到(min_dimension, max_dimension)之间。

方法二：FixedShapedResizer

详细参数如下（冒号后面为默认值）：

a. height：300
b. width：300
c. resize_method：BILINEAR
d. convert_to_grayscale：false

详解：使用设定的resize_method方法，将图像尺寸resize到(height, width)。

convert_to_grayscale：是否将图像转为灰度图像，[height, width, 3] —> [height, width, 1]

方法三：IdentityResizer

在文件中暂时没有对该方法的描述，暂不介绍及使用

方法四：ConditionalShapeResizer

详细参数如下（冒号后面为默认值）：

a. condition：GREATER
b. size_threshold：300
c. resize_method：BILINEAR
d. convert_to_grayscale：false

详解：condition可设置参数为：’GREATER’或’SMALLER’，size_thresghold默认设置为300，可以根据自己的实际情况进行设置。当condition设为’GREATER’时，如果图像的尺寸大于300则会对图像尺寸resize处理至300（保持图像宽高比不变）；同理当condition设为‘SMALLER’时，小于300的图像也会被resize为300（保持图像宽高比不变）。

方法五：PadToMultipleResizer

详细参数如下（冒号后面为默认值）：

a. multiple：1
b. convert_to_grayscale：false

详解：将图像进行填充（用0），使之可以被设定的multiple整除。例如：输入图像尺寸为：(101, 199, 3），multiple设置为4，则被填充为（104, 200, 3）。

备注：resize_method的可选参数，在image_resizer.proto文件中同样有做介绍，共有四种：

1）BILINEAR：tf.image.ResizeMethod.BILINEAR，双线性插值
2）NEAREST_NEIGHBOR：tf.image.ResizeMethod.NEAREST_NEIGHBOR，最近邻插值
3）BICUBIC：tf.image.ResizeMethod.BICUBIC，双三次插值
4）AREA：tf.image.ResizeMethod.AREA，面积插值

相关阅读:
metricbeat监控logstash运行状态上报Elasticsearch后Kibana可视化查看
第九天！玩转langchain！回调处理器！一篇学会日志+监控+流式传输！9/10
【Git系列】03_GitHub操作
Go并发编程（上）
外设驱动库开发笔记48：MCP4725单通道DAC驱动
Java相关编程思想
Strus2 系列漏洞
合作式智能运输系统通信架构
Docker+nginx在CVM的机器远程发布hellogin
Django框架基础

原文地址：https://blog.csdn.net/weixin_38739735/article/details/126387742