四、Transforms

transform是torchvision下的一个.py文件，这个python文件中定义了很多的类和方法，主要实现对图片进行一些变换操作

一、Transforms讲解

from torchvision import transforms#按着Ctrl，点击transforms
1

进入到__init__.py文件中

from .transforms import *#再次按着Ctrl，点击.transforms
from .autoaugment import *
1
2

进入transform.py文件中，可以看到transforms其实就是transform.py一个python文件，可以理解为其是一个工具包
在这里插入图片描述
点击Structure，或Alt+7，查看下这个文件的大概结构框架

File–Settings–keymap–structure，可以查看快捷键

通俗点：transform指的就是transform.py文件，该文件里面有好多类，可以对图像进行各种各样的操作

二、ToTensor类

看下文档给的使用说明
Ctrl+P：显示方法所需要的参数
在这里插入图片描述

    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. This transform does not support torchscript.
#可以看到其实就将PIL Image、numpy.ndarray类型的图片转换为tensor类型
#PIL针对的是Python自带的Image进行open操作；numpy.ndarray针对的是OpenCV的imread操作

    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
    if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
    or if the numpy.ndarray has dtype = np.uint8

    In the other cases, tensors are returned without scaling.

    .. note::
        Because the input image is scaled to [0.0, 1.0], this transformation should not be used when
        transforming target image masks. See the `references`_ for implementing the transforms for image masks.

    .. _references: https://github.com/pytorch/vision/tree/main/references/segmentation
    """
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Ⅰ通过PIL的Image读取图片类型为PIL，使用ToTensor将图片类型转换为tensor，并通过add_image上传tensorbord

import cv2 as cv
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

img_path = "G:/PyCharm/workspace/learning_pytorch/dataset/a/3.jpg"

# 通过Image打开的图片类型为PIL
img = Image.open(img_path)
print(type(img))#

# # 通过opencv的imread打开的图片类型为numpy.ndarray
# img = cv.imread(img_path)
# print(type(img))#

#通过transforms的ToTensor即可转换为Tensor类型
tensor_trans = transforms.ToTensor()#创建ToTensor对象
tensor_img = tensor_trans(img)#Ctrl+p  查看需要传入的参数，传入图片
print(type(tensor_img))#
print(tensor_img.shape)#torch.Size([3, 299, 300])

"""
add_image()要求：
①图片类型为torch.Tensor, numpy.array, or string/blobname
②图片尺寸规格为(3, H, W)，若不一样需要通过dataformats参数进行声明
很显然tensor_img满足add_image的基本要求，可以直接传入使用
"""

writer = SummaryWriter("y_log")

writer.add_image("tensor_img",tensor_img)#默认从0开始
writer.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

在Terminal下运行tensorboard --logdir=y_log --port=2312，logdir为打开事件文件的路径，port为指定端口打开；
通过指定端口2312进行打开tensorboard，若不设置port参数，默认通过6006端口进行打开。
在这里插入图片描述
点击该链接或者复制链接到浏览器打开即可

Ⅱ为啥神经网络中传入的图片数据类型必须是tensor？

打开Python Console，将上面的代码复制运行
可以看到tensor包含grad梯度等信息，也就是tensor数据类型包装了神经网络所需要的一些参数信息
在这里插入图片描述

Ⅲcall方法的作用

transform.py文件中的ToTensor类下面有一个__call__方法，接下来进行探讨下该方法的作用是啥
在这里插入图片描述

class Band:
    def __call__(self, bandname):
        print("call-"+bandname)

    def music_band(self,bandname):
        print("hello-"+bandname)


band = Band()
band("beyond")#call-beyond
band.music_band("huangjiaju")#hello-huangjiaju
1
2
3
4
5
6
7
8
9
10
11

由结果可以看出，在Band类中，若直接对其对象传入参数，会使用__call__方法；若指定某个方法名称才会使用某方法。其实__call__方法起到默认优先考虑的效果而已。

三、ToPILImage类

看下文档给的使用说明
Ctrl+P：显示方法所需要的参数
在这里插入图片描述

    """Convert a tensor or an ndarray to PIL Image. This transform does not support torchscript.
#将tensor、ndarray 转换为PIL类型

    Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape
    H x W x C to a PIL Image while preserving the value range.

    Args:
        mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).
            If ``mode`` is ``None`` (default) there are some assumptions made about the input data:
            - If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.
            - If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.
            - If the input has 2 channels, the ``mode`` is assumed to be ``LA``.
            - If the input has 1 channel, the ``mode`` is determined by the data type (i.e ``int``, ``float``,
            ``short``).

    .. _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes
    """
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

通过ToPILImage方法可将tensor、ndarray类型图片转换为PIL类型

from torch.utils.tensorboard import SummaryWriter
from PIL import Image
import cv2 as cv 
import numpy as np
from torchvision import transforms

img_path = "G:/PyCharm/workspace/learning_pytorch/dataset/a/3.jpg"

img = cv.imread(img_path)
type(img)#numpy.ndarray

PIL = transforms.ToPILImage()
PIL_img = PIL(img)
type(PIL_img)#PIL.Image.Image

PIL_img.show()#展示照片

cv.imshow("img",img)#展示照片
cv.waitKey(0)
cv.destroyAllWindows()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

四、Normalize类

看下文档给的使用说明
Ctrl+P：显示方法所需要的参数
在这里插入图片描述

    """Normalize a tensor image with mean and standard deviation.
#用均值和标准差归一化张量图像，也就是归一化操作
    This transform does not support PIL Image.
    Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``
    channels, this transform will normalize each channel of the input
    ``torch.*Tensor`` i.e.,
    ``output[channel] = (input[channel] - mean[channel]) / std[channel]``

    .. note::
        This transform acts out of place, i.e., it does not mutate the input tensor.

    Args:
        mean (sequence): Sequence of means for each channel.
        std (sequence): Sequence of standard deviations for each channel.
        inplace(bool,optional): Bool to make this operation in-place.

    """
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

使用要求：必须是tensor类型，由文档介绍可得：

output[channel] = (input[channel] - mean[channel]) / std[channel]

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms

write = SummaryWriter("y_log")

img_path = "dataset/b/6.jpg"

img = cv.imread(img_path)
print(type(img))#
print(img.size)#61375
print(img.shape)#(375, 499, 3)

trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
print(type(img_tensor))#



print(img_tensor[0][0][0])#tensor(0.5255)
trans_normalize = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
img_normalize = trans_normalize(img_tensor)
print(img_normalize[0][0][0])#tensor(0.0510)

#公式：output[channel] = (input[channel] - mean[channel]) / std[channel]
#（0.5255-0.5）/0.5 = 0.051

print(img_normalize.shape)#torch.Size([3, 375, 499])
#shape符合add_image的要求(C,H,W)，可直接传入使用


write.add_image("img_normalize",img_normalize)

write.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

五、Resize类

看下文档给的使用说明
Ctrl+P：显示方法所需要的参数
在这里插入图片描述

    """Resize the input image to the given size.
#将输入图像调整为给定大小，也就是对输入图像进行尺寸变换
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions

    .. warning::
        The output image might be different depending on its type: when downsampling, the interpolation of PIL images
        and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences
        in the performance of a network. Therefore, it is preferable to train and serve a model with the same input
        types. See also below the ``antialias`` parameter, which can help making the output of PIL images and tensors
        closer.

    Args:
        size (sequence or int): Desired output size. If size is a sequence like
            (h, w), output size will be matched to this. If size is an int,
            smaller edge of the image will be matched to this number.
            i.e, if height > width, then image will be rescaled to
            (size * height / width, size).
#需要给出要裁剪成的形状(h,w)，若只给一个数，则默认裁剪成一个正方形

            .. note::
                In torchscript mode size as single int is not supported, use a sequence of length 1: ``[size, ]``.
        interpolation (InterpolationMode): Desired interpolation enum defined by
            :class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.
            If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` and
            ``InterpolationMode.BICUBIC`` are supported.
            For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.
        max_size (int, optional): The maximum allowed for the longer edge of
            the resized image: if the longer edge of the image is greater
            than ``max_size`` after being resized according to ``size``, then
            the image is resized again so that the longer edge is equal to
            ``max_size``. As a result, ``size`` might be overruled, i.e the
            smaller edge may be shorter than ``size``. This is only supported
            if ``size`` is an int (or a sequence of length 1 in torchscript
            mode).
        antialias (bool, optional): antialias flag. If ``img`` is PIL Image, the flag is ignored and anti-alias
            is always used. If ``img`` is Tensor, the flag is False by default and can be set to True for
            ``InterpolationMode.BILINEAR`` only mode. This can help making the output for PIL images and tensors
            closer.

            .. warning::
                There is no autodiff support for ``antialias=True`` option with input ``img`` as Tensor.

    """

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

输入类型为PIL图片，通过Resize转换大小，再通过ToTensor转换为tensor类型上传tensorboard

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms

write = SummaryWriter("y_log")

img_path = "dataset/b/6.jpg"

img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375)   原始图片的大小
trans_resize = transforms.Resize((300,300))
img_PIL_resize = trans_resize(img)#进行裁剪
print(img_PIL_resize)#  原图像已经变成了(300，300)，但还是PIL类型

#要想上传到tensorboard上，必须是tensor、numpy.array类型，这里通过ToTensor方法转换为tensor
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img_PIL_resize)
print(type(img_tensor))#

write.add_image("img_PIL_resize",img_tensor)#默认从0开始

write.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

六、Compose类

看下文档给的使用说明
Ctrl+P：显示方法所需要的参数
在这里插入图片描述

    """Composes several transforms together. This transform does not support torchscript.
#组合一些transforms一起使用

    Please, see the note below.

    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.

    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),#先对图片进行一次中心裁剪
        >>>     transforms.PILToTensor(),#再对图片转换为tensor
        >>>     transforms.ConvertImageDtype(torch.float),#之后再将图像转换为dtype，如果需要，缩放其值
        >>> ])#一个Compose可以实现多次的transforms对图片进行操作

    .. note::
        In order to script the transformations, please use ``torch.nn.Sequential`` as below.

        >>> transforms = torch.nn.Sequential(
        >>>     transforms.CenterCrop(10),
        >>>     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
        >>> )
        >>> scripted_transforms = torch.jit.script(transforms)

        Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor``, does not require
        `lambda` functions or ``PIL.Image``.

    """
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

说白了就是组合多种transform操作

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms

writer = SummaryWriter('y_log')

img_path = "dataset/b/6.jpg"

img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375)   原始图片的大小
#①剪切尺寸
trans_resize = transforms.Resize((300,300))
img_PIL_resize = trans_resize(img)#进行裁剪
print(img_PIL_resize)#  原图像已经变成了(300，300)，但还是PIL类型

#②PIL转Tensor
trans_tensor = transforms.ToTensor()

trans_compose = transforms.Compose([trans_resize,trans_tensor])
#Compose参数都是transform对象，且第一个输出必须满足第二个输入
#trans_resize为Resize对象，最后输出为PIL类型
#trans_tensor为ToTensor对象，输入为PIL，输出为tensor

img_all = trans_compose(img)
#因为最后输出为tensor，故才可以通过add_image上传至tensorboard

writer.add_image("compose_img",img_all)
writer.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

八、RandomCrop类

看下文档给的使用说明
Ctrl+P：显示方法所需要的参数
在这里插入图片描述

    """Crop the given image at a random location.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions,
    but if non-constant padding is used, the input is expected to have at most 2 leading dimensions

    Args:
        size (sequence or int): Desired output size of the crop. If size is an
            int instead of sequence like (h, w), a square crop (size, size) is
            made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
        padding (int or sequence, optional): Optional padding on each border
            of the image. Default is None. If a single int is provided this
            is used to pad all borders. If sequence of length 2 is provided this is the padding
            on left/right and top/bottom respectively. If a sequence of length 4 is provided
            this is the padding for the left, top, right and bottom borders respectively.
#需要给出要裁剪成的形状(h,w)，若只给一个数，则默认裁剪成一个正方形

            .. note::
                In torchscript mode padding as single int is not supported, use a sequence of
                length 1: ``[padding, ]``.
        pad_if_needed (boolean): It will pad the image if smaller than the
            desired size to avoid raising an exception. Since cropping is done
            after padding, the padding seems to be done at a random offset.
        fill (number or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of
            length 3, it is used to fill R, G, B channels respectively.
            This value is only used when the padding_mode is constant.
            Only number is supported for torch Tensor.
            Only int or str or tuple value is supported for PIL Image.
        padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric.
            Default is constant.

            - constant: pads with a constant value, this value is specified with fill

            - edge: pads with the last value at the edge of the image.
              If input a 5D torch Tensor, the last 3 dimensions will be padded instead of the last 2

            - reflect: pads with reflection of image without repeating the last value on the edge.
              For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
              will result in [3, 2, 1, 2, 3, 4, 3, 2]

            - symmetric: pads with reflection of image repeating the last value on the edge.
              For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
              will result in [2, 1, 1, 2, 3, 4, 4, 3]
    """

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

说白了就是随机对图片进行裁剪

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms

writer = SummaryWriter('y_log')

img_path = "dataset/b/6.jpg"

img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375)   原始图片的大小
#①随机剪切尺寸
trans_random = transforms.RandomCrop((200,250))#(h,w)
img_PIL_random = trans_random(img)#随机进行裁剪
print(img_PIL_random)#  
#PIL输出为(w,h)，即原图像已经变成了(h,w)，(200,250)，但还是PIL类型

#②PIL转Tensor
trans_tensor = transforms.ToTensor()

trans_compose = transforms.Compose([trans_random,trans_tensor])
#Compose参数都是transform对象，且第一个输出必须满足第二个输入
#trans_resize为Resize对象，最后输出为PIL类型
#trans_tensor为ToTensor对象，输入为PIL，输出为tensor



for i in range(10):
    img_randomcrop = trans_compose(img)
    # 因为最后输出为tensor，故才可以通过add_image上传至tensorboard
    writer.add_image("img_randomcrop",img_randomcrop,i)

writer.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

七、CenterCrop类

看下文档给的使用说明
Ctrl+P：显示方法所需要的参数
在这里插入图片描述

    """Crops the given image at the center.
#对图像进行中心裁剪
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions.
    If image size is smaller than output size along any edge, image is padded with 0 and then center cropped.

    Args:
        size (sequence or int): Desired output size of the crop. If size is an
            int instead of sequence like (h, w), a square crop (size, size) is
            made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
    """
1
2
3
4
5
6
7
8
9
10
11

说白了就是对图像进行中心裁剪

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms

writer = SummaryWriter('y_log')

img_path = "dataset/b/6.jpg"

img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375)   原始图片的大小
#①中间剪切尺寸
trans_center = transforms.CenterCrop((200,250))#(h,w)
img_PIL_center = trans_center(img)#随机进行裁剪
print(img_PIL_center)#
#PIL输出为(w,h)，即原图像已经变成了(h,w)，(200,250)，但还是PIL类型

#②PIL转Tensor
trans_tensor = transforms.ToTensor()

trans_compose = transforms.Compose([trans_center,trans_tensor])
#Compose参数都是transform对象，且第一个输出必须满足第二个输入
#trans_resize为Resize对象，最后输出为PIL类型
#trans_tensor为ToTensor对象，输入为PIL，输出为tensor

img_centercrop = trans_compose(img)

writer.add_image("img_centercrop",img_centercrop)
writer.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

相关阅读:
如何使用idea连接服务器上的mysql？
Oracle数据库连接之TNS-03505_无法解析服务名异常
 vuekeyclock 集成
 Python类型转换
 .NET 反向代理-YARP
Go语言基础之基本语法
 【服务器数据恢复】RAID5多块硬盘先后离线的数据恢复案例
 开发小程序如何使用iconfont彩色图标
 腾讯面试——算法岗实习&深度学习&CV方向
 DVWA系列4：XSS 跨站脚本攻击之 DOM型和反射型
原文地址：https://blog.csdn.net/qq_41264055/article/details/126420475

四、Transforms

一、Transforms讲解

二、ToTensor类

Ⅰ通过PIL的Image读取图片类型为PIL，使用ToTensor将图片类型转换为tensor，并通过add_image上传tensorbord

Ⅱ为啥神经网络中传入的图片数据类型必须是tensor？

Ⅲ__call__方法的作用

三、ToPILImage类

通过ToPILImage方法可将tensor、ndarray类型图片转换为PIL类型

四、Normalize类

使用要求：必须是tensor类型，由文档介绍可得：

五、Resize类

输入类型为PIL图片，通过Resize转换大小，再通过ToTensor转换为tensor类型上传tensorboard

六、Compose类

说白了就是组合多种transform操作

八、RandomCrop类

说白了就是随机对图片进行裁剪

七、CenterCrop类

说白了就是对图像进行中心裁剪

Ⅲcall方法的作用