• 轻量自高斯注意力(LSGA)机制


    light轻量)Self-Gaussian-Attention vision transformer(高斯自注意力视觉transformer) for hyperspectral image classification(高光谱图像分类

    论文:Light Self-Gaussian-Attention Vision Transformer for Hyperspectral Image Classification

    代码:GitHub - machao132/LSGA-VIT

    目录

    一、摘要

    二、创新点

    三、Method

    1. Hybrid Spectral–Spatial Tokenizer(混合谱-空间标记器)

    2. Light Self-Attention Mechanism(轻量的自注意机制)

    3. Gaussian Absolute Position Bias(高斯绝对位置偏差)

    4. LSGA transformer模块总述

    四、实验

    1. 消融实验

    2. 对比实验

    五、总结

    六、代码实现


    一、摘要

    研究现状:近年来,卷积神经网络(convolutional neural networks,CNNs)由于其在局部特征提取方面的优异性能,在高光谱图像分类中得到了广泛的应用。然而,由于卷积核的局部连接和权值共享特性,cnn在长距离关系建模(远程依赖关系)方面存在局限性,而更深层次的网络往往会增加计算成本。

    主要工作:针对这些问题,本文提出了一种基于轻量自高斯注意(light self-gauss-attention, LSGA)机制的视觉转换器(vision Transformer, VIT),提取全局深度语义特征

    1. 首先,空间-光谱混合标记器模块提取浅层空间-光谱特征,扩展图像小块生成标记

    2. 其次,轻自注意使用Q(查询)、X(原点输入),而不是Q、K(键)和V(值)来减少计算量和参数。

    3. 此外,为了避免位置信息的缺乏导致中心特征和邻域特征混叠,我们设计了高斯绝对位置偏差来模拟HSI数据分布,使关注权值更接近中心查询块

    研究成果:多个实验验证了该方法的有效性,在4个数据集上的性能优于目前最先进的方法。具体来说,我们观察到A2S2K的准确率提高了0.62%,SSFTT 的准确率提高了 0.11%。综上所述,LSGA-VIT方法在HSI 分类中具有良好的应用前景,在解决位置感知远程建模和计算成本等问题上具有一定的潜力。

       

    二、创新点

    1) 作者使用混合空间-频谱标记器来代替patch嵌入,以保持输入图像patch的完整关系,并获得局部特征关系,使后续的VIT块,以实现全局特征提取。

    2) LSGA-VIT模块被设计为联合收割机高斯位置信息和轻SA(LSA)机制,模拟场的中心位置和局部位置之间的关系。通过提高中心位置的谱特征权重,有效地减少了计算量和参数个数。

    3) 该方法融合了全局和局部特征表达,有效地提高了HSI分类的准确率。烧蚀实验和对比实验均验证了该方法的有效性和优越性。

        

      

    三、Method

    LSGA-VIT框架,主要由混合谱-空间标记器、LSGA模块(轻量自高斯注意力模块)构成,如下图。

    1. Hybrid Spectral–Spatial Tokenizer(混合谱-空间标记器)

    输入:高光谱图像X_0 \in R^{H \times W \times S}经过PCA(主成分分析法)降维后得到的X_{PCA} \in R^{H \times W \times s},s代表图像的bands(图像的bands也叫通道)。

    输入样本:再输入之前还要把图像裁剪为n个大小为h×w的样本X_1 \in R^{h \times w \times s}

    过程:混合空间-频谱标记化模块,首先,将X_1 重塑X_2 \in R^{s \times h\times w}作为卷积层的输入。再利用3-D卷积层来提取空间-频谱特征。

    第一层,3-D卷积层,从卷积到激活的定义如下:x^{i,j,k}表示为在第m个卷积层的第n个特征图的像素位置(i,j,k)激活值

    其中,\varphi表示激活函数,b_{m,n}是偏置参数,2\lambda +12\mu +12\eta +1分别代表卷积核的通道数、高度和宽度,κ表示卷积核。为输入的3-D数据块数量,这里我们设= 1。 通过3-D卷积层后,合并前两个维度后,输出大小( \theta \times s',h',w'),再输入到2-D卷积层中。

    第二层,2-D卷积层,从卷积到激活的定义如下:

    通过2-D卷积层后,输出空间的大小还原为原大小,输出大小( c',h,w)。其中,其中 c′ 为线性投影输出通道数patch嵌入维数

    最后,将后两个维度展平并转置,获得X \in R^{t\times c'} ,其中 t = h \times wX由t个token向量组成)。将向量长度为 c' 的 t 个token输入到LSGA Transformer模块。

    Q:线性投影输出通道数?

    接下来的LSGA模块中,包含线性的残差连接,也就是说输入、输出通道数是一致的,所以c′ 也可以说是接下来线性投影输出通道数。

    Q:patch嵌入维数是什么?

    transformer的patch embedding(嵌入)是指在模型中将图像分成固定大小的patchs,并通过线性变换得到每个patch的embedding(经过嵌入层的维度),嵌入维数也就是这个embedding的维度。

    Q:token的作用?

    将每个像素的直接展开作为token可以保持图像上每个点的空间相关性

    代码

    1. class PatchEmbed(nn.Module):
    2. def __init__(self, img_size, patch_size, conv_embed_dim=4, in_chans=3, embed_dim=96, norm_layer=None):
    3. super().__init__()
    4. img_size = to_2tuple(img_size)
    5. patch_size = to_2tuple(patch_size)
    6. patches_resolution = [img_size[0], img_size[1]]
    7. self.img_size = img_size
    8. self.patch_size = patch_size
    9. self.patches_resolution = patches_resolution
    10. self.num_patches = patches_resolution[0] * patches_resolution[1]
    11. self.conv_embed_dim = conv_embed_dim
    12. self.in_chans = in_chans
    13. self.embed_dim = embed_dim
    14. self.conv3d_features = nn.Sequential(
    15. nn.Conv3d(1, out_channels=conv_embed_dim, kernel_size=(3, 3, 3), padding=1, stride=1),
    16. nn.BatchNorm3d(conv_embed_dim),
    17. nn.ReLU(),
    18. )
    19. self.conv2d_features = nn.Sequential(
    20. nn.Conv2d(in_channels=in_chans * conv_embed_dim, out_channels=embed_dim, kernel_size=(3, 3), padding=1,
    21. stride=1),
    22. nn.BatchNorm2d(embed_dim),
    23. nn.ReLU(),
    24. )
    25. self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, padding=1, stride=1)
    26. if norm_layer is not None:
    27. self.norm = norm_layer(embed_dim)
    28. else:
    29. self.norm = None
    30. def forward(self, x):
    31. B, C, H, W = x.shape
    32. # FIXME look at relaxing size constraints
    33. assert H == self.img_size[0] and W == self.img_size[1], \
    34. f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
    35. x = x.unsqueeze(1) # 扩展维度, [b, 1, 36, w, h]
    36. x = self.conv3d_features(x) # 3-D卷积层, [b, 4, 36, w, h]
    37. x = x.view(B, -1, H, W) # 合并前两个维度,方便输入2-D卷积层, [b, 144, w, h]
    38. x = self.conv2d_features(x) # 2-D卷积层, [b, 96, w, h]
    39. x = x.flatten(2).transpose(1, 2) # 将后两个维度展平并转置,得到长度为96的w*h个token向量, [b, w*h, 96]
    40. # x = self.proj(x).flatten(2).transpose(1, 2) # B Ph*Pw C
    41. if self.norm is not None:
    42. x = self.norm(x) # 归一化,来减少样本不平衡对分类精度的影响, [b, w*h, 96]
    43. return x
    44. def flops(self):
    45. Ho, Wo = self.patches_resolution
    46. flops = Ho * Wo * self.embed_dim * self.in_chans * (self.patch_size[0] * self.patch_size[1])
    47. if self.norm is not None:
    48. flops += Ho * Wo * self.embed_dim
    49. return flops

       

    2. Light Self-Attention Mechanism(轻量的自注意机制)

    传统自注意力机制:

    其中,A_q,A_k,A_v \in R^{c' \times c'}是权重矩阵(没错,就是权重矩阵,Q,K,V也可以通过线性层获得)。 

       

    简化过程推导:

    令​A = A_qA_K^T\in R^{c'\times c},A是一个权重矩阵。公式10简化为:

    \hat{Q} = XA,则

    由于Q\hat{Q} 都是由X变换得到的,因此我们直接把\hat{Q}看作Q,则

                                                                                                                QK^T = QX^T

    -----------------------------------------(简化掉了权重矩阵A_k,分支头K只输入一个X,不再进行A_k操作)------------------------------------

    再令,则

    根据公式9,可得

    作者在注意力框架末尾多增加了一个线性层A_t

    A = A_vA_t,则

    A = A_vA_t,也就是说末尾的线性操作A_t和分支头V的线性操作A_v可进行合并,作者使用线性操作A_t代替A_v,换句话说,线性层A_v移到了框架末尾(框架末尾不属于计算式之内),因此分支头V不再进行A_v操作,最终得到下面的定义。 

    -----------------------------------(将线性层A_v转移到了框架末尾,分支头V只输入一个X,不再进行A_v操作)-------------------------------

    最终,将线性层A_v移到了框架末尾,轻量的自注意机制定义如下:

    总结:经过公式推导简化,简化掉了A_kA_v,两个权重矩阵。

      

    3. Gaussian Absolute Position Bias(高斯绝对位置偏差)

    目的:由于Transform模型无法捕获标记(token)的位置信息,因此有必要添加表示相对或绝对位置的模块。

    方法:在多头自注意力机制中,对每个头引入相对位置偏差参数。表示如下:

    其中,d是多头注意机制中每个头的维度,B \in R^{t\times t}是相对位置偏差参数。

    因为,在本文中,不是从图像patch生成token,而是每个像素作为一个token,可以保留图像的空间关系。为了计算每一个像素的空间关系,采用2-D高斯函数来表示图像的空间关系,获取高斯绝对位置信息,以高斯绝对位置代替相对位置偏差,2-D高斯函数定义如下:

    其中,σ是标准偏差,(x,y)表示空间位置坐标,G是高斯位置矩阵。 

    代码: 

    1. ## 获取轻量的自高斯注意力
    2. class LSGAttention(nn.Module):
    3. def __init__(self, dim, att_inputsize, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0., proj_drop=0.):
    4. super().__init__()
    5. self.dim = dim
    6. self.att_inputsize = att_inputsize[0]
    7. self.num_heads = num_heads
    8. head_dim = dim // num_heads
    9. self.scale = qk_scale or head_dim ** -0.5
    10. self.qkv = nn.Linear(dim, dim, bias=qkv_bias) # 线性层
    11. self.attn_drop = nn.Dropout(attn_drop) # 随机丢弃一些线性神经元,防止过拟合
    12. self.proj = nn.Linear(dim, dim) # 线性层
    13. self.proj_drop = nn.Dropout(proj_drop)
    14. self.softmax = nn.Softmax(dim=-1)
    15. totalpixel = self.att_inputsize * self.att_inputsize
    16. gauss_coords_h = torch.arange(totalpixel) - int((totalpixel - 1) / 2)
    17. gauss_coords_w = torch.arange(totalpixel) - int((totalpixel - 1) / 2)
    18. gauss_x, gauss_y = torch.meshgrid([gauss_coords_h, gauss_coords_w])
    19. sigma = 10
    20. gauss_pos_index = torch.exp(torch.true_divide(-(gauss_x ** 2 + gauss_y ** 2), (2 * sigma ** 2))) # 二维高斯函数
    21. self.register_buffer("gauss_pos_index", gauss_pos_index)
    22. self.token_wA = nn.Parameter(torch.empty(1, self.att_inputsize * self.att_inputsize, dim),
    23. requires_grad=True) # Tokenization parameters
    24. torch.nn.init.xavier_normal_(self.token_wA)
    25. self.token_wV = nn.Parameter(torch.empty(1, dim, dim),
    26. requires_grad=True) # Tokenization parameters
    27. torch.nn.init.xavier_normal_(self.token_wV)
    28. def forward(self, x, mask=None):
    29. B_, N, C = x.shape
    30. wa = repeat(self.token_wA, '() n d -> b n d', b=B_) # wa (bs 4 64)
    31. wa = rearrange(wa, 'b h w -> b w h') # Transpose # wa (bs 64 4)
    32. A = torch.einsum('bij,bjk->bik', x, wa) # A (bs 81 4)
    33. A = rearrange(A, 'b h w -> b w h') # Transpose # A (bs 4 81)
    34. A = A.softmax(dim=-1)
    35. VV = repeat(self.token_wV, '() n d -> b n d', b=B_) # VV(bs,64,64)
    36. VV = torch.einsum('bij,bjk->bik', x, VV) # VV(bs,81,64)
    37. x = torch.einsum('bij,bjk->bik', A, VV) # T(bs,4,64)
    38. absolute_pos_bias = self.gauss_pos_index.unsqueeze(0) # 获取高斯绝对位置信息
    39. q = self.qkv(x).reshape(B_, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3) # 分支头q进行线性变换
    40. k = x.reshape(B_, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3) # 分支头k,v直接输入一个x
    41. v = x.reshape(B_, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)
    42. q = q * self.scale # 除以根号d,对注意力权重进行缩放,以确保数值的稳定性
    43. attn = (q @ k.transpose(-2, -1)) # 矩阵乘法,计算相似性矩阵
    44. attn = attn + absolute_pos_bias.unsqueeze(0) # 融合高斯绝对位置信息
    45. if mask is not None:
    46. nW = mask.shape[0]
    47. attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
    48. attn = attn.view(-1, self.num_heads, N, N)
    49. attn = self.softmax(attn)
    50. else:
    51. attn = self.softmax(attn) # softmax函数,进行归一化处理
    52. attn = self.attn_drop(attn) # 随机丢弃一些线性神经元,防止过拟合
    53. x = (attn @ v).transpose(1, 2).reshape(B_, N, C) # 融合注意力
    54. x = self.proj(x) # 最后再通过一个线性层
    55. x = self.proj_drop(x) # 随机丢弃,防止过拟合
    56. return x
    57. def extra_repr(self) -> str:
    58. return f'dim={self.dim}, num_heads={self.num_heads}'

      

         

    4. LSGA transformer模块总述

    LSGA transformer模块,如下图所示:

    在LSGA Transformer模块中,对输入特征图进行层归一化,然后通过LSGA得到的X' \in R^{81\times96},再进行残差连接。之后,LSGA Transformer模块的末端顺序地执行归一化、MLP(多层感知机)和残差连接。在执行两次LSGA Transformer模块之后,最后,通过线性层将特征图展平以用于最终分类

    代码: 

    1. ## 轻量的自高斯注意模块
    2. class LSGAVITBlock(nn.Module):
    3. def __init__(self, dim, input_resolution, num_heads,
    4. mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,
    5. act_layer=nn.GELU, norm_layer=nn.LayerNorm,
    6. fused_window_process=False):
    7. super().__init__()
    8. self.dim = dim
    9. self.input_resolution = input_resolution
    10. self.num_heads = num_heads
    11. self.mlp_ratio = mlp_ratio
    12. self.norm1 = norm_layer(dim)
    13. self.attn = LSGAttention(
    14. dim, att_inputsize=input_resolution, num_heads=num_heads,
    15. qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
    16. self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() # 将深度学习模型中的多分支结构随机”删除“,将这些路径上的权重置为0,从而减少模型参数的数量,防止过拟合
    17. self.norm2 = norm_layer(dim)
    18. mlp_hidden_dim = int(dim * mlp_ratio)
    19. self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
    20. def forward(self, x):
    21. H, W = self.input_resolution
    22. B, L, C = x.shape
    23. assert L == H * W
    24. shortcut = x
    25. x = self.norm1(x) # 正则化层
    26. x = self.attn(x) # 获取轻量的自高斯注意力
    27. x = shortcut + self.drop_path(x) # 残差连接
    28. x = x + self.drop_path(self.mlp(self.norm2(x))) # 再经过正则化层 + MLP(多层感知机),再进行残差连接
    29. return x
    30. def extra_repr(self) -> str:
    31. return f"dim={self.dim}, input_resolution={self.input_resolution}, num_heads={self.num_heads}, " \
    32. f"mlp_ratio={self.mlp_ratio}"

      

      

    四、实验

    数据集:IP,Salinsa Scene(SA),Pavia University(PU)和Houston 2013(Houston)。

    评价标准:总体准确率(OA)、平均准确率(AA)和Kappa系数(K)作为量化指标来验证LSGA-VIT的实验性能。

    实验配置:所有实验都在配备有Intel Core I7- 10700 K CPU、RTX 2070 GPU和32-GB RAM的计算机上进行。本文中的所有实验训练集都设置为数据集的10%。 

    1. 消融实验

    对比SA(自注意力)、LSA(轻量自注意力)、自高斯注意(self-Gaussian attention, SGA)和LSGA(轻量自高斯注意力)

    可以看到LSGA减少了 50%的计算量,减少了 30%的参数,仅损失了 0.02%的OA。

    2. 对比实验

    实验数据表明,该算法在4个数据集上均取得了较好的分类精度,尤其是在3个数据集上取得了最优的OA。

       

      

    五、总结

    1. 通过混合谱-空间标记器获得X(t个token向量),以保持图像上每个点的空间相关性,再传入LSGA模块融合注意力。

    2. LSGA模块(轻量自高斯注意力模块)首先经过公式推导去除了分支头K,V中的线性操作A_kA_v。(没错这里是线性,作者采用线性变换而不是卷积来降低参数)

    3. 采用2-D高斯函数获取高斯绝对位置来提取像素间的空间关系。

      

       

    六、代码实现

    1. import torch
    2. from einops import rearrange, repeat
    3. import torch.nn as nn
    4. import torch.utils.checkpoint as checkpoint
    5. from timm.models.layers import DropPath, to_2tuple, trunc_normal_
    6. from torchsummary import summary
    7. class Mlp(nn.Module):
    8. def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
    9. super().__init__()
    10. out_features = out_features or in_features
    11. hidden_features = hidden_features or in_features
    12. self.fc1 = nn.Linear(in_features, hidden_features)
    13. self.act = act_layer()
    14. self.fc2 = nn.Linear(hidden_features, out_features)
    15. self.drop = nn.Dropout(drop)
    16. def forward(self, x):
    17. x = self.fc1(x)
    18. x = self.act(x)
    19. x = self.drop(x)
    20. x = self.fc2(x)
    21. x = self.drop(x)
    22. return x
    23. ## 获取轻量的自高斯注意力
    24. class LSGAttention(nn.Module):
    25. def __init__(self, dim, att_inputsize, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0., proj_drop=0.):
    26. super().__init__()
    27. self.dim = dim
    28. self.att_inputsize = att_inputsize[0]
    29. self.num_heads = num_heads
    30. head_dim = dim // num_heads
    31. self.scale = qk_scale or head_dim ** -0.5
    32. self.qkv = nn.Linear(dim, dim, bias=qkv_bias) # 线性层
    33. self.attn_drop = nn.Dropout(attn_drop) # 随机丢弃一些线性神经元,防止过拟合
    34. self.proj = nn.Linear(dim, dim) # 线性层
    35. self.proj_drop = nn.Dropout(proj_drop)
    36. self.softmax = nn.Softmax(dim=-1)
    37. totalpixel = self.att_inputsize * self.att_inputsize
    38. gauss_coords_h = torch.arange(totalpixel) - int((totalpixel - 1) / 2)
    39. gauss_coords_w = torch.arange(totalpixel) - int((totalpixel - 1) / 2)
    40. gauss_x, gauss_y = torch.meshgrid([gauss_coords_h, gauss_coords_w])
    41. sigma = 10
    42. gauss_pos_index = torch.exp(torch.true_divide(-(gauss_x ** 2 + gauss_y ** 2), (2 * sigma ** 2))) # 二维高斯函数
    43. self.register_buffer("gauss_pos_index", gauss_pos_index)
    44. self.token_wA = nn.Parameter(torch.empty(1, self.att_inputsize * self.att_inputsize, dim),
    45. requires_grad=True) # Tokenization parameters
    46. torch.nn.init.xavier_normal_(self.token_wA)
    47. self.token_wV = nn.Parameter(torch.empty(1, dim, dim),
    48. requires_grad=True) # Tokenization parameters
    49. torch.nn.init.xavier_normal_(self.token_wV)
    50. def forward(self, x, mask=None):
    51. B_, N, C = x.shape
    52. wa = repeat(self.token_wA, '() n d -> b n d', b=B_) # wa (bs 4 64)
    53. wa = rearrange(wa, 'b h w -> b w h') # Transpose # wa (bs 64 4)
    54. A = torch.einsum('bij,bjk->bik', x, wa) # A (bs 81 4)
    55. A = rearrange(A, 'b h w -> b w h') # Transpose # A (bs 4 81)
    56. A = A.softmax(dim=-1)
    57. VV = repeat(self.token_wV, '() n d -> b n d', b=B_) # VV(bs,64,64)
    58. VV = torch.einsum('bij,bjk->bik', x, VV) # VV(bs,81,64)
    59. x = torch.einsum('bij,bjk->bik', A, VV) # T(bs,4,64)
    60. absolute_pos_bias = self.gauss_pos_index.unsqueeze(0) # 获取高斯绝对位置信息
    61. q = self.qkv(x).reshape(B_, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3) # 分支头q进行线性变换
    62. k = x.reshape(B_, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3) # 分支头k,v直接输入一个x
    63. v = x.reshape(B_, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)
    64. q = q * self.scale # 除以根号d,对注意力权重进行缩放,以确保数值的稳定性
    65. attn = (q @ k.transpose(-2, -1)) # 矩阵乘法,计算相似性矩阵
    66. attn = attn + absolute_pos_bias.unsqueeze(0) # 融合高斯绝对位置信息
    67. if mask is not None:
    68. nW = mask.shape[0]
    69. attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
    70. attn = attn.view(-1, self.num_heads, N, N)
    71. attn = self.softmax(attn)
    72. else:
    73. attn = self.softmax(attn) # softmax函数,进行归一化处理
    74. attn = self.attn_drop(attn) # 随机丢弃一些线性神经元,防止过拟合
    75. x = (attn @ v).transpose(1, 2).reshape(B_, N, C) # 融合注意力
    76. x = self.proj(x) # 最后再通过一个线性层
    77. x = self.proj_drop(x) # 随机丢弃,防止过拟合
    78. return x
    79. def extra_repr(self) -> str:
    80. return f'dim={self.dim}, num_heads={self.num_heads}'
    81. ## 轻量的自高斯注意模块
    82. class LSGAVITBlock(nn.Module):
    83. def __init__(self, dim, input_resolution, num_heads,
    84. mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,
    85. act_layer=nn.GELU, norm_layer=nn.LayerNorm,
    86. fused_window_process=False):
    87. super().__init__()
    88. self.dim = dim
    89. self.input_resolution = input_resolution
    90. self.num_heads = num_heads
    91. self.mlp_ratio = mlp_ratio
    92. self.norm1 = norm_layer(dim)
    93. self.attn = LSGAttention(
    94. dim, att_inputsize=input_resolution, num_heads=num_heads,
    95. qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
    96. self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() # 将深度学习模型中的多分支结构随机”删除“,将这些路径上的权重置为0,从而减少模型参数的数量,防止过拟合
    97. self.norm2 = norm_layer(dim)
    98. mlp_hidden_dim = int(dim * mlp_ratio)
    99. self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
    100. def forward(self, x):
    101. H, W = self.input_resolution
    102. B, L, C = x.shape
    103. assert L == H * W
    104. shortcut = x
    105. x = self.norm1(x) # 正则化层
    106. x = self.attn(x) # 获取轻量的自高斯注意力
    107. x = shortcut + self.drop_path(x) # 残差连接
    108. x = x + self.drop_path(self.mlp(self.norm2(x))) # 再经过正则化层 + MLP(多层感知机),再进行残差连接
    109. return x
    110. def extra_repr(self) -> str:
    111. return f"dim={self.dim}, input_resolution={self.input_resolution}, num_heads={self.num_heads}, " \
    112. f"mlp_ratio={self.mlp_ratio}"
    113. class PatchMerging(nn.Module):
    114. def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):
    115. super().__init__()
    116. self.input_resolution = input_resolution
    117. self.dim = dim
    118. self.reduction = nn.Linear(4 * dim, 2 * dim, bias=False)
    119. self.norm = norm_layer(4 * dim)
    120. def forward(self, x):
    121. """
    122. x: B, H*W, C
    123. """
    124. H, W = self.input_resolution
    125. B, L, C = x.shape
    126. assert L == H * W, "input feature has wrong size"
    127. assert H % 2 == 0 and W % 2 == 0, f"x size ({H}*{W}) are not even."
    128. x = x.view(B, H, W, C)
    129. x0 = x[:, 0::2, 0::2, :] # B H/2 W/2 C
    130. x1 = x[:, 1::2, 0::2, :] # B H/2 W/2 C
    131. x2 = x[:, 0::2, 1::2, :] # B H/2 W/2 C
    132. x3 = x[:, 1::2, 1::2, :] # B H/2 W/2 C
    133. x = torch.cat([x0, x1, x2, x3], -1) # B H/2 W/2 4*C
    134. x = x.view(B, -1, 4 * C) # B H/2*W/2 4*C
    135. x = self.norm(x)
    136. x = self.reduction(x)
    137. return x
    138. def extra_repr(self) -> str:
    139. return f"input_resolution={self.input_resolution}, dim={self.dim}"
    140. def flops(self):
    141. H, W = self.input_resolution
    142. flops = H * W * self.dim
    143. flops += (H // 2) * (W // 2) * 4 * self.dim * 2 * self.dim
    144. return flops
    145. class BasicLayer(nn.Module):
    146. def __init__(self, dim, input_resolution, depth, num_heads,
    147. mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0.,
    148. drop_path=0., norm_layer=nn.LayerNorm, downsample=None, use_checkpoint=False,
    149. fused_window_process=False):
    150. super().__init__()
    151. self.dim = dim
    152. self.input_resolution = input_resolution
    153. self.depth = depth
    154. self.use_checkpoint = use_checkpoint
    155. # build blocks
    156. self.blocks = nn.ModuleList([
    157. LSGAVITBlock(dim=dim, input_resolution=input_resolution,
    158. num_heads=num_heads,
    159. mlp_ratio=mlp_ratio,
    160. qkv_bias=qkv_bias, qk_scale=qk_scale,
    161. drop=drop, attn_drop=attn_drop,
    162. drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,
    163. norm_layer=norm_layer,
    164. fused_window_process=fused_window_process)
    165. for i in range(depth)])
    166. # patch merging layer
    167. if downsample is not None:
    168. self.downsample = downsample(input_resolution, dim=dim, norm_layer=norm_layer)
    169. else:
    170. self.downsample = None
    171. def forward(self, x):
    172. for blk in self.blocks:
    173. if self.use_checkpoint:
    174. x = checkpoint.checkpoint(blk, x)
    175. else:
    176. x = blk(x)
    177. if self.downsample is not None:
    178. x = self.downsample(x)
    179. return x
    180. def extra_repr(self) -> str:
    181. return f"dim={self.dim}, input_resolution={self.input_resolution}, depth={self.depth}"
    182. def flops(self):
    183. flops = 0
    184. for blk in self.blocks:
    185. flops += blk.flops()
    186. if self.downsample is not None:
    187. flops += self.downsample.flops()
    188. return flops
    189. ## 混合谱-空间标记器
    190. class PatchEmbed(nn.Module):
    191. def __init__(self, img_size, patch_size, conv_embed_dim=4, in_chans=3, embed_dim=96, norm_layer=None):
    192. super().__init__()
    193. img_size = to_2tuple(img_size)
    194. patch_size = to_2tuple(patch_size)
    195. patches_resolution = [img_size[0], img_size[1]]
    196. self.img_size = img_size
    197. self.patch_size = patch_size
    198. self.patches_resolution = patches_resolution
    199. self.num_patches = patches_resolution[0] * patches_resolution[1]
    200. self.conv_embed_dim = conv_embed_dim
    201. self.in_chans = in_chans
    202. self.embed_dim = embed_dim
    203. self.conv3d_features = nn.Sequential(
    204. nn.Conv3d(1, out_channels=conv_embed_dim, kernel_size=(3, 3, 3), padding=1, stride=1),
    205. nn.BatchNorm3d(conv_embed_dim),
    206. nn.ReLU(),
    207. )
    208. self.conv2d_features = nn.Sequential(
    209. nn.Conv2d(in_channels=in_chans * conv_embed_dim, out_channels=embed_dim, kernel_size=(3, 3), padding=1,
    210. stride=1),
    211. nn.BatchNorm2d(embed_dim),
    212. nn.ReLU(),
    213. )
    214. self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, padding=1, stride=1)
    215. if norm_layer is not None:
    216. self.norm = norm_layer(embed_dim)
    217. else:
    218. self.norm = None
    219. def forward(self, x):
    220. B, C, H, W = x.shape
    221. # FIXME look at relaxing size constraints
    222. assert H == self.img_size[0] and W == self.img_size[1], \
    223. f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
    224. x = x.unsqueeze(1) # 扩展维度, [b, 1, 36, w, h]
    225. x = self.conv3d_features(x) # 3-D卷积层, [b, 4, 36, w, h]
    226. x = x.view(B, -1, H, W) # 合并前两个维度,方便输入2-D卷积层, [b, 144, w, h]
    227. x = self.conv2d_features(x) # 2-D卷积层, [b, 96, w, h]
    228. x = x.flatten(2).transpose(1, 2) # 将后两个维度展平并转置,得到长度为96的w*h个token向量, [b, w*h, 96]
    229. # x = self.proj(x).flatten(2).transpose(1, 2) # B Ph*Pw C
    230. if self.norm is not None:
    231. x = self.norm(x) # 归一化,来减少样本不平衡对分类精度的影响, [b, w*h, 96]
    232. return x
    233. def flops(self):
    234. Ho, Wo = self.patches_resolution
    235. flops = Ho * Wo * self.embed_dim * self.in_chans * (self.patch_size[0] * self.patch_size[1])
    236. if self.norm is not None:
    237. flops += Ho * Wo * self.embed_dim
    238. return flops
    239. ## LSGAVIT主模块
    240. class LSGAVIT(nn.Module):
    241. def __init__(self, img_size, patch_size, in_chans, num_classes,
    242. embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
    243. mlp_ratio=4., qkv_bias=True, qk_scale=None,
    244. drop_rate=0., attn_drop_rate=0., drop_path_rate=0.1,
    245. norm_layer=nn.LayerNorm, ape=False, patch_norm=True,
    246. use_checkpoint=False, fused_window_process=False, **kwargs):
    247. super().__init__()
    248. self.num_classes = num_classes
    249. self.num_layers = len(depths)
    250. self.embed_dim = embed_dim
    251. self.ape = ape
    252. self.patch_norm = patch_norm
    253. self.num_features = int(embed_dim * 2 ** (self.num_layers - 1))
    254. self.mlp_ratio = mlp_ratio
    255. # split image into non-overlapping patches
    256. self.patch_embed = PatchEmbed(
    257. img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,
    258. norm_layer=norm_layer if self.patch_norm else None)
    259. num_patches = self.patch_embed.num_patches
    260. patches_resolution = self.patch_embed.patches_resolution
    261. self.patches_resolution = patches_resolution
    262. # absolute position embedding
    263. if self.ape:
    264. self.absolute_pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))
    265. trunc_normal_(self.absolute_pos_embed, std=.02)
    266. self.pos_drop = nn.Dropout(p=drop_rate)
    267. # stochastic depth
    268. dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # stochastic depth decay rule
    269. # build layers
    270. self.layers = nn.ModuleList()
    271. for i_layer in range(self.num_layers):
    272. layer = BasicLayer(dim=int(embed_dim * 2 ** i_layer),
    273. input_resolution=(patches_resolution[0] // (2 ** i_layer),
    274. patches_resolution[1] // (2 ** i_layer)),
    275. depth=depths[i_layer],
    276. num_heads=num_heads[i_layer],
    277. mlp_ratio=self.mlp_ratio,
    278. qkv_bias=qkv_bias, qk_scale=qk_scale,
    279. drop=drop_rate, attn_drop=attn_drop_rate,
    280. drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],
    281. norm_layer=norm_layer,
    282. downsample=PatchMerging if (i_layer < self.num_layers - 1) else None,
    283. use_checkpoint=use_checkpoint,
    284. fused_window_process=fused_window_process)
    285. self.layers.append(layer)
    286. self.norm = norm_layer(self.num_features)
    287. self.avgpool = nn.AdaptiveAvgPool1d(1)
    288. self.head = nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity()
    289. self.apply(self._init_weights)
    290. def _init_weights(self, m):
    291. if isinstance(m, nn.Linear):
    292. trunc_normal_(m.weight, std=.02)
    293. if isinstance(m, nn.Linear) and m.bias is not None:
    294. nn.init.constant_(m.bias, 0)
    295. elif isinstance(m, nn.LayerNorm):
    296. nn.init.constant_(m.bias, 0)
    297. nn.init.constant_(m.weight, 1.0)
    298. @torch.jit.ignore
    299. def no_weight_decay(self):
    300. return {'absolute_pos_embed'}
    301. def forward_features(self, x):
    302. x = self.patch_embed(x) # 1. 获取token向量,即patch embedding
    303. if self.ape:
    304. x = x + self.absolute_pos_embed # 这里是一个线性残差,引入高斯绝对位置偏差
    305. x = self.pos_drop(x) # 随机丢弃一些线性神经元,防止过拟合,这里设为0
    306. for layer in self.layers:
    307. x = layer(x) # 2. 融合自注意力,调用 x -> LSGAVITBlock轻量高斯自注意力模块
    308. x = self.norm(x) # B L C
    309. x = self.avgpool(x.transpose(1, 2)) # B C 1
    310. x = torch.flatten(x, 1) # 3. 将特征图展平以用于最终分类
    311. return x
    312. def forward(self, x):
    313. x = self.forward_features(x)
    314. x = self.head(x)
    315. return x
    316. def flops(self):
    317. flops = 0
    318. flops += self.patch_embed.flops()
    319. for i, layer in enumerate(self.layers):
    320. flops += layer.flops()
    321. flops += self.num_features * self.patches_resolution[0] * self.patches_resolution[1] // (2 ** self.num_layers)
    322. flops += self.num_features * self.num_classes
    323. return flops
    324. if __name__ == '__main__':
    325. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    326. net = LSGAVIT(8, 3, 36, 16).to(device)
    327. # 打印网络结构和参数
    328. summary(net, ( 36, 8, 8))

  • 相关阅读:
    网络攻防实验 (by quqi99)
    线上项目源码安全性处理方案
    用DIV+CSS技术设计的餐饮美食网页与实现制作(web前端网页制作课作业)HTML+CSS+JavaScript美食汇响应式美食菜谱网站模板
    使用vue3 和Springboot 通过 websocket实现前后端通信
    【自然语言处理(NLP)】基于ERNIE-GEN的中文自动文摘
    产品周报第33期|完善铁粉规则,优化原创保护策略,升级创作中心的数据展示,开放业界专家自定义域名权益……
    html5
    h5的扫一扫功能 (非微信浏览器环境下)
    SSM基于微信小程序的实验室安全管理系统毕业设计-附源码031527
    学生党蓝牙耳机怎么选?四款性价比高的蓝牙耳机推荐
  • 原文地址:https://blog.csdn.net/qq_45981086/article/details/133436621