transform是torchvision下的一个.py文件,这个python文件中定义了很多的类和方法,主要实现对图片进行一些变换操作
from torchvision import transforms#按着Ctrl,点击transforms
进入到__init__.py文件中
from .transforms import *#再次按着Ctrl,点击.transforms
from .autoaugment import *
进入transform.py文件中,可以看到transforms其实就是transform.py一个python文件,可以理解为其是一个工具包

点击Structure,或Alt+7,查看下这个文件的大概结构框架

File–Settings–keymap–structure,可以查看快捷键

通俗点:transform指的就是transform.py文件,该文件里面有好多类,可以对图像进行各种各样的操作
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数

"""Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. This transform does not support torchscript.
#可以看到其实就将PIL Image、numpy.ndarray类型的图片转换为tensor类型
#PIL针对的是Python自带的Image进行open操作;numpy.ndarray针对的是OpenCV的imread操作
Converts a PIL Image or numpy.ndarray (H x W x C) in the range
[0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
or if the numpy.ndarray has dtype = np.uint8
In the other cases, tensors are returned without scaling.
.. note::
Because the input image is scaled to [0.0, 1.0], this transformation should not be used when
transforming target image masks. See the `references`_ for implementing the transforms for image masks.
.. _references: https://github.com/pytorch/vision/tree/main/references/segmentation
"""
import cv2 as cv
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
img_path = "G:/PyCharm/workspace/learning_pytorch/dataset/a/3.jpg"
# 通过Image打开的图片类型为PIL
img = Image.open(img_path)
print(type(img))#
# # 通过opencv的imread打开的图片类型为numpy.ndarray
# img = cv.imread(img_path)
# print(type(img))#
#通过transforms的ToTensor即可转换为Tensor类型
tensor_trans = transforms.ToTensor()#创建ToTensor对象
tensor_img = tensor_trans(img)#Ctrl+p 查看需要传入的参数,传入图片
print(type(tensor_img))#
print(tensor_img.shape)#torch.Size([3, 299, 300])
"""
add_image()要求:
①图片类型为torch.Tensor, numpy.array, or string/blobname
②图片尺寸规格为(3, H, W),若不一样需要通过dataformats参数进行声明
很显然tensor_img满足add_image的基本要求,可以直接传入使用
"""
writer = SummaryWriter("y_log")
writer.add_image("tensor_img",tensor_img)#默认从0开始
writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。

点击该链接或者复制链接到浏览器打开即可

打开Python Console,将上面的代码复制运行
可以看到tensor包含grad梯度等信息,也就是tensor数据类型包装了神经网络所需要的一些参数信息

transform.py文件中的ToTensor类下面有一个__call__方法,接下来进行探讨下该方法的作用是啥

class Band:
def __call__(self, bandname):
print("call-"+bandname)
def music_band(self,bandname):
print("hello-"+bandname)
band = Band()
band("beyond")#call-beyond
band.music_band("huangjiaju")#hello-huangjiaju
由结果可以看出,在Band类中,若直接对其对象传入参数,会使用__call__方法;若指定某个方法名称才会使用某方法。其实__call__方法起到默认优先考虑的效果而已。
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数

"""Convert a tensor or an ndarray to PIL Image. This transform does not support torchscript.
#将tensor、ndarray 转换为PIL类型
Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape
H x W x C to a PIL Image while preserving the value range.
Args:
mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).
If ``mode`` is ``None`` (default) there are some assumptions made about the input data:
- If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.
- If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.
- If the input has 2 channels, the ``mode`` is assumed to be ``LA``.
- If the input has 1 channel, the ``mode`` is determined by the data type (i.e ``int``, ``float``,
``short``).
.. _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes
"""
from torch.utils.tensorboard import SummaryWriter
from PIL import Image
import cv2 as cv
import numpy as np
from torchvision import transforms
img_path = "G:/PyCharm/workspace/learning_pytorch/dataset/a/3.jpg"
img = cv.imread(img_path)
type(img)#numpy.ndarray
PIL = transforms.ToPILImage()
PIL_img = PIL(img)
type(PIL_img)#PIL.Image.Image
PIL_img.show()#展示照片
cv.imshow("img",img)#展示照片
cv.waitKey(0)
cv.destroyAllWindows()
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数

"""Normalize a tensor image with mean and standard deviation.
#用均值和标准差归一化张量图像,也就是归一化操作
This transform does not support PIL Image.
Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``
channels, this transform will normalize each channel of the input
``torch.*Tensor`` i.e.,
``output[channel] = (input[channel] - mean[channel]) / std[channel]``
.. note::
This transform acts out of place, i.e., it does not mutate the input tensor.
Args:
mean (sequence): Sequence of means for each channel.
std (sequence): Sequence of standard deviations for each channel.
inplace(bool,optional): Bool to make this operation in-place.
"""
output[channel] = (input[channel] - mean[channel]) / std[channel]
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms
write = SummaryWriter("y_log")
img_path = "dataset/b/6.jpg"
img = cv.imread(img_path)
print(type(img))#
print(img.size)#61375
print(img.shape)#(375, 499, 3)
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img)
print(type(img_tensor))#
print(img_tensor[0][0][0])#tensor(0.5255)
trans_normalize = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
img_normalize = trans_normalize(img_tensor)
print(img_normalize[0][0][0])#tensor(0.0510)
#公式:output[channel] = (input[channel] - mean[channel]) / std[channel]
#(0.5255-0.5)/0.5 = 0.051
print(img_normalize.shape)#torch.Size([3, 375, 499])
#shape符合add_image的要求(C,H,W),可直接传入使用
write.add_image("img_normalize",img_normalize)
write.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。

点击该链接或者复制链接到浏览器打开即可

看下文档给的使用说明
Ctrl+P:显示方法所需要的参数

"""Resize the input image to the given size.
#将输入图像调整为给定大小,也就是对输入图像进行尺寸变换
If the image is torch Tensor, it is expected
to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions
.. warning::
The output image might be different depending on its type: when downsampling, the interpolation of PIL images
and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences
in the performance of a network. Therefore, it is preferable to train and serve a model with the same input
types. See also below the ``antialias`` parameter, which can help making the output of PIL images and tensors
closer.
Args:
size (sequence or int): Desired output size. If size is a sequence like
(h, w), output size will be matched to this. If size is an int,
smaller edge of the image will be matched to this number.
i.e, if height > width, then image will be rescaled to
(size * height / width, size).
#需要给出要裁剪成的形状(h,w),若只给一个数,则默认裁剪成一个正方形
.. note::
In torchscript mode size as single int is not supported, use a sequence of length 1: ``[size, ]``.
interpolation (InterpolationMode): Desired interpolation enum defined by
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` and
``InterpolationMode.BICUBIC`` are supported.
For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.
max_size (int, optional): The maximum allowed for the longer edge of
the resized image: if the longer edge of the image is greater
than ``max_size`` after being resized according to ``size``, then
the image is resized again so that the longer edge is equal to
``max_size``. As a result, ``size`` might be overruled, i.e the
smaller edge may be shorter than ``size``. This is only supported
if ``size`` is an int (or a sequence of length 1 in torchscript
mode).
antialias (bool, optional): antialias flag. If ``img`` is PIL Image, the flag is ignored and anti-alias
is always used. If ``img`` is Tensor, the flag is False by default and can be set to True for
``InterpolationMode.BILINEAR`` only mode. This can help making the output for PIL images and tensors
closer.
.. warning::
There is no autodiff support for ``antialias=True`` option with input ``img`` as Tensor.
"""
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms
write = SummaryWriter("y_log")
img_path = "dataset/b/6.jpg"
img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375) 原始图片的大小
trans_resize = transforms.Resize((300,300))
img_PIL_resize = trans_resize(img)#进行裁剪
print(img_PIL_resize)# 原图像已经变成了(300,300),但还是PIL类型
#要想上传到tensorboard上,必须是tensor、numpy.array类型,这里通过ToTensor方法转换为tensor
trans_tensor = transforms.ToTensor()
img_tensor = trans_tensor(img_PIL_resize)
print(type(img_tensor))#
write.add_image("img_PIL_resize",img_tensor)#默认从0开始
write.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。

点击该链接或者复制链接到浏览器打开即可

与下面的归一化之后的图像相比,大小很明显发生了变化
看下文档给的使用说明
Ctrl+P:显示方法所需要的参数

"""Composes several transforms together. This transform does not support torchscript.
#组合一些transforms一起使用
Please, see the note below.
Args:
transforms (list of ``Transform`` objects): list of transforms to compose.
Example:
>>> transforms.Compose([
>>> transforms.CenterCrop(10),#先对图片进行一次中心裁剪
>>> transforms.PILToTensor(),#再对图片转换为tensor
>>> transforms.ConvertImageDtype(torch.float),#之后再将图像转换为dtype,如果需要,缩放其值
>>> ])#一个Compose可以实现多次的transforms对图片进行操作
.. note::
In order to script the transformations, please use ``torch.nn.Sequential`` as below.
>>> transforms = torch.nn.Sequential(
>>> transforms.CenterCrop(10),
>>> transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
>>> )
>>> scripted_transforms = torch.jit.script(transforms)
Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor``, does not require
`lambda` functions or ``PIL.Image``.
"""
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms
writer = SummaryWriter('y_log')
img_path = "dataset/b/6.jpg"
img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375) 原始图片的大小
#①剪切尺寸
trans_resize = transforms.Resize((300,300))
img_PIL_resize = trans_resize(img)#进行裁剪
print(img_PIL_resize)# 原图像已经变成了(300,300),但还是PIL类型
#②PIL转Tensor
trans_tensor = transforms.ToTensor()
trans_compose = transforms.Compose([trans_resize,trans_tensor])
#Compose参数都是transform对象,且第一个输出必须满足第二个输入
#trans_resize为Resize对象,最后输出为PIL类型
#trans_tensor为ToTensor对象,输入为PIL,输出为tensor
img_all = trans_compose(img)
#因为最后输出为tensor,故才可以通过add_image上传至tensorboard
writer.add_image("compose_img",img_all)
writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。

点击该链接或者复制链接到浏览器打开即可,该操作其实就是将Resize和ToTensor进行了整合使用而已

看下文档给的使用说明
Ctrl+P:显示方法所需要的参数

"""Crop the given image at a random location.
If the image is torch Tensor, it is expected
to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions,
but if non-constant padding is used, the input is expected to have at most 2 leading dimensions
Args:
size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is
made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
padding (int or sequence, optional): Optional padding on each border
of the image. Default is None. If a single int is provided this
is used to pad all borders. If sequence of length 2 is provided this is the padding
on left/right and top/bottom respectively. If a sequence of length 4 is provided
this is the padding for the left, top, right and bottom borders respectively.
#需要给出要裁剪成的形状(h,w),若只给一个数,则默认裁剪成一个正方形
.. note::
In torchscript mode padding as single int is not supported, use a sequence of
length 1: ``[padding, ]``.
pad_if_needed (boolean): It will pad the image if smaller than the
desired size to avoid raising an exception. Since cropping is done
after padding, the padding seems to be done at a random offset.
fill (number or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of
length 3, it is used to fill R, G, B channels respectively.
This value is only used when the padding_mode is constant.
Only number is supported for torch Tensor.
Only int or str or tuple value is supported for PIL Image.
padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric.
Default is constant.
- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value at the edge of the image.
If input a 5D torch Tensor, the last 3 dimensions will be padded instead of the last 2
- reflect: pads with reflection of image without repeating the last value on the edge.
For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
will result in [3, 2, 1, 2, 3, 4, 3, 2]
- symmetric: pads with reflection of image repeating the last value on the edge.
For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
will result in [2, 1, 1, 2, 3, 4, 4, 3]
"""
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms
writer = SummaryWriter('y_log')
img_path = "dataset/b/6.jpg"
img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375) 原始图片的大小
#①随机剪切尺寸
trans_random = transforms.RandomCrop((200,250))#(h,w)
img_PIL_random = trans_random(img)#随机进行裁剪
print(img_PIL_random)#
#PIL输出为(w,h),即原图像已经变成了(h,w),(200,250),但还是PIL类型
#②PIL转Tensor
trans_tensor = transforms.ToTensor()
trans_compose = transforms.Compose([trans_random,trans_tensor])
#Compose参数都是transform对象,且第一个输出必须满足第二个输入
#trans_resize为Resize对象,最后输出为PIL类型
#trans_tensor为ToTensor对象,输入为PIL,输出为tensor
for i in range(10):
img_randomcrop = trans_compose(img)
# 因为最后输出为tensor,故才可以通过add_image上传至tensorboard
writer.add_image("img_randomcrop",img_randomcrop,i)
writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。

点击该链接或者复制链接到浏览器打开即可

看下文档给的使用说明
Ctrl+P:显示方法所需要的参数

"""Crops the given image at the center.
#对图像进行中心裁剪
If the image is torch Tensor, it is expected
to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions.
If image size is smaller than output size along any edge, image is padded with 0 and then center cropped.
Args:
size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is
made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
"""
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
import cv2 as cv
import numpy as np
from torchvision import transforms
writer = SummaryWriter('y_log')
img_path = "dataset/b/6.jpg"
img = Image.open(img_path)
print(type(img))#
print(img.size)#(499, 375) 原始图片的大小
#①中间剪切尺寸
trans_center = transforms.CenterCrop((200,250))#(h,w)
img_PIL_center = trans_center(img)#随机进行裁剪
print(img_PIL_center)#
#PIL输出为(w,h),即原图像已经变成了(h,w),(200,250),但还是PIL类型
#②PIL转Tensor
trans_tensor = transforms.ToTensor()
trans_compose = transforms.Compose([trans_center,trans_tensor])
#Compose参数都是transform对象,且第一个输出必须满足第二个输入
#trans_resize为Resize对象,最后输出为PIL类型
#trans_tensor为ToTensor对象,输入为PIL,输出为tensor
img_centercrop = trans_compose(img)
writer.add_image("img_centercrop",img_centercrop)
writer.close()
在Terminal下运行tensorboard --logdir=y_log --port=2312,logdir为打开事件文件的路径,port为指定端口打开;
通过指定端口2312进行打开tensorboard,若不设置port参数,默认通过6006端口进行打开。

点击该链接或者复制链接到浏览器打开即可
