变换是常见的图像变换,torchvision.transforms module转换模块。可以使用Compose将它们链接在一起。大多数变换同时接受PIL像和张量像,尽管有些变换是只PIL的,有些是只张量的。转换变换可用于转换到和从PIL图像。
接受张量像的变换也接受成批的张量像。张量图像是一个形状为(C, H, W)的张量,其中C是通道数,H和W是图像的高度和宽度。一批张量图像是(B, C, H, W)形状的张量,其中B是批图像的个数。
随机转换将对给定批处理的所有图像应用相同的转换,但它们将在调用之间产生不同的转换。对于跨调用的可重现转换,可以使用函数转换。
In order to script the transformations, please use torch.nn.Sequential instead of Compose.
transforms = torch.nn.Sequential(
transforms.CenterCrop(10),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
)
scripted_transforms = torch.jit.script(transforms)
Make sure to use only scriptable transformations, i.e. that work with torch.Tensor and does not require lambda functions or PIL.Image.
For any custom transformations to be used with torch.jit.script, they should be derived from torch.nn.Module
torchvision.transforms.Compose(transforms)
将几个变换组合在一起。这个转换不支持torchscript。请参阅下面的说明。
Parameters
transforms (list of Transform objects) – list of transforms to compose.
>>> transforms.Compose([
>>> transforms.CenterCrop(10),
>>> transforms.PILToTensor(),
>>> transforms.ConvertImageDtype(torch.float),
>>> ])
torchvision.transforms.CenterCrop(size)
Parameters
size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
forward(img)[SOURCE]
Parameters
img (PIL Image or Tensor) – Image to be cropped.
Returns
Cropped image.
Return type
PIL Image or Tensor
import torch.nn as nn
transforms = torch.nn.Sequential(
T.RandomCrop(224),
T.RandomHorizontalFlip(p=0.3),
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
dog1 = dog1.to(device)
dog2 = dog2.to(device)
transformed_dog1 = transforms(dog1)
transformed_dog2 = transforms(dog2)
show([transformed_dog1, transformed_dog2])
torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
Parameters
brightness (float or tuple of python:float (min, max)) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of python:float (min, max)) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of python:float (min, max)) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
hue (float or tuple of python:float (min, max)) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
jitter = T.ColorJitter(brightness=.5, hue=.3)
jitted_imgs = [jitter(orig_img) for _ in range(4)]
plot(jitted_imgs)
torchvision.transforms.FiveCrop(size)
top_left, top_right, bottom_left, bottom_right, center) = T.FiveCrop(size=(100, 100))(orig_img)
plot([top_left, top_right, bottom_left, bottom_right, center])
gray_img = T.Grayscale()(orig_img)
plot([gray_img], cmap='gray')
padded_imgs = [T.Pad(padding=padding)(orig_img) for padding in (3, 10, 30, 50)]
plot(padded_imgs)
torchvision.transforms.RandomAffine(degrees, translate=None, scale=None, shear=None, interpolation=<InterpolationMode.NEAREST: 'nearest'>, fill=0, fillcolor=None, resample=None, center=None)
Parameters
degrees (sequence or number) – Range of degrees to select from. If degrees is a number instead of sequence like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.
translate (tuple, optional) – tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Will not translate by default.
scale (tuple, optional) – scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Will keep original scale by default.
shear (sequence or number, optional) – Range of degrees to select from. If shear is a number, a shear parallel to the x axis in the range (-shear, +shear) will be applied. Else if shear is a sequence of 2 values a shear parallel to the x axis in the range (shear[0], shear[1]) will be applied. Else if shear is a sequence of 4 values, a x-axis shear in (shear[0], shear[1]) and y-axis shear in (shear[2], shear[3]) will be applied. Will not apply shear by default.
interpolation (InterpolationMode) – Desired interpolation enum defined by torchvision.transforms.InterpolationMode. Default is InterpolationMode.NEAREST. If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.BILINEAR are supported. For backward compatibility integer values (e.g. PIL.Image.NEAREST) are still acceptable.
fill (sequence or number) – Pixel fill value for the area outside the transformed image. Default is 0. If given a number, the value is used for all bands respectively.
fillcolor (sequence or number, optional) –
affine_transfomer = T.RandomAffine(degrees=(30, 70), translate=(0.1, 0.3), scale=(0.5, 0.75))
affine_imgs = [affine_transfomer(orig_img) for _ in range(4)]
plot(affine_imgs)
torchvision.transforms.RandomApply(transforms, p=0.5)
applier = T.RandomApply(transforms=[T.RandomCrop(size=(64, 64))], p=0.5)
transformed_imgs = [applier(orig_img) for _ in range(4)]
plot(transformed_imgs)
torchvision.transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode=‘constant’)
Parameters
size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
padding (int or sequence, optional) –
Optional padding on each border of the image. Default is None. If a single int is provided this is used to pad all borders. If sequence of length 2 is provided this is the padding on left/right and top/bottom respectively. If a sequence of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
NOTE
In torchscript mode padding as single int is not supported, use a sequence of length 1: [padding, ].
pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset.
fill (number or str or tuple) – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only number is supported for torch Tensor. Only int or str or tuple value is supported for PIL Image.
padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
constant: pads with a constant value, this value is specified with fill
edge: pads with the last value at the edge of the image. If input a 5D torch Tensor, the last 3 dimensions will be padded instead of the last 2
reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]
symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
cropper = T.RandomCrop(size=(128, 128))
crops = [cropper(orig_img) for _ in range(4)]
plot(crops)
torchvision.transforms.TenCrop(size, vertical_flip=False)
>>> transform = Compose([
>>> TenCrop(size), # this is a list of PIL Images
>>> Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
Apply single transformation randomly picked from a list.,RandomOrder(transforms)
Apply a list of transformations in a random order.
torchvision.transforms.LinearTransformation(transformation_matrix, mean_vector)
Parameters
transformation_matrix (Tensor) – tensor [D x D], D = C x H x W
mean_vector (Tensor) – tensor [D], D = C x H x W
forward(tensor: torch.Tensor) → torch.Tensor[SOURCE]
Parameters
tensor (Tensor) – Tensor image to be whitened.
Returns
Transformed image.
Return type
Tensor
orchvision.transforms.Normalize(mean, std, inplace=False)
Parameters
mean (sequence) – Sequence of means for each channel.
std (sequence) – Sequence of standard deviations for each channel.
inplace (bool,optional) – Bool to make this operation in-place.
def preprocess(batch):
transforms = T.Compose(
[
T.ConvertImageDtype(torch.float32),
T.Normalize(mean=0.5, std=0.5), # map [0, 1] into [-1, 1]
T.Resize(size=(520, 960)),
]
)
batch = transforms(batch)
return batch
功能:对图像进行随机遮挡
• p:概率值,执行该操作的概率
• scale:遮挡区域的面积
• ratio:遮挡区域长宽比
• value:设置遮挡区域的像素值,(R, G, B) or (Gray)
torchvision.transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=0, inplace=False)
Parameters
p – probability that the random erasing operation will be performed.执行随机擦除操作的概率。
scale – range of proportion of erased area against input image.擦除区域与输入图像的比例范围。
ratio – range of aspect ratio of erased area.擦除面积的纵横比范围。
value – erasing value. Default is 0. If a single int, it is used to erase all pixels. If a tuple of length 3, it is used to erase R, G, B channels respectively. If a str of ‘random’, erasing each pixel with random values.
inplace – boolean to make this transform inplace. Default set to False.
>>> transform = transforms.Compose([
>>> transforms.RandomHorizontalFlip(),
>>> transforms.PILToTensor(),
>>> transforms.ConvertImageDtype(torch.float),
>>> transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
>>> transforms.RandomErasing(),
>>> ])
RandomErasing(p=0.5,
scale=(0.02, 0.33),
ratio=(0.3, 3.3),
value=0,
inplace=False)
torchvision.transforms.ConvertImageDtype(dtype: torch.dtype)