• STViT-R 代码阅读记录


    目录

    一、SwinTransformer

    1、原理

     2、代码

    二、STViT-R

    1、中心思想

    2、代码与原文


    本次不做具体的训练。只是看代码。所以只需搭建它的网络,执行一次前向传播即可。

    一、SwinTransformer

    1、原理

    主要思想,将token按区域划分成窗口,只需每个窗口内的token单独进行 self-attention

    但是不同之间的窗口没有进行交互,为了解决这个问题。提出了

     2、代码

    1、均匀的划分窗口

    1. x_windows = window_partition(shifted_x, self.window_size) # nW*B, window_size, window_size, C window_size 7 # 划分窗口 (64,7,7,96)
    2. x_windows = x_windows.view(-1, self.window_size * self.window_size, C) # nW*B, window_size*window_size, C (64,49,96)

    二、STViT-R

    1、中心思想

    在浅层的 transformer保持不变,去提取低层 特征, 保证image token 中包含丰富的空间信息。在深层时,本文提出了 STGM 去生成 语义token, 通过聚类,整个图像由一些具有高级语义信息的标记来表示。。 在第一个STGM过程中,语义token 由 intra and inter-window spatial pooling初始化。 由于这种空间初始化,语义token主要包含局部语义信息,并在空间中实现离散和均匀分布。 在接下来的注意层中,除了进一步的聚类外,语义标记还配备了全局聚类中心,网络可以自适应地选择部分语义标记,以聚焦于全局语义信息。

    2、代码与原文

    对应

    1. xx = x.reshape(B, H // self.window_size, self.window_size, W // self.window_size, self.window_size, C) # (1,2,7,2,7,384)
    2. windows = xx.permute(0, 1, 3, 2, 4, 5).contiguous().reshape(-1, self.window_size, self.window_size, C).permute(0, 3, 1, 2) # (4,384,7,7)
    3. shortcut = self.multi_scale(windows) # B*nW, W*W, C multi_scale.py --13 (4,9,384)
    4. if self.use_conv_pos: # False
    5. shortcut = self.conv_pos(shortcut)
    6. pool_x = self.norm1(shortcut.reshape(B, -1, C)).reshape(-1, self.multi_scale.num_samples, C) # (4,9,384)
    7. #
    8. class multi_scale_semantic_token1(nn.Module):
    9. def __init__(self, sample_window_size):
    10. super().__init__()
    11. self.sample_window_size = sample_window_size # 3
    12. self.num_samples = sample_window_size * sample_window_size
    13. def forward(self, x): # (4,384,7,7)
    14. B, C, _, _ = x.size()
    15. pool_x = F.adaptive_max_pool2d(x, (self.sample_window_size, self.sample_window_size)).view(B, C, self.num_samples).transpose(2, 1) # (4,9,384)
    16. return pool_x

    注意,这个是按照每个窗口内进行 pooling的。代码中,窗口size为7,分成了4个窗口,故pooling前的 x(4,384,7,7),pooling后,按窗口池化,每个窗口池化后的 size为3,故池化后的输出 (4,9,384)。 至于参数的设置,由于采用的是local,所以文中所述

    而且

      

    所以 有了如下的操作,将原来窗口的size扩大了,

    1. k_windows = F.unfold(x.permute(0, 3, 1, 2), kernel_size=10, stride=4).view(B, C, 10, 10, -1).permute(0, 4, 2, 3, 1) # (1,4,10,10,384)
    2. k_windows = k_windows.reshape(-1, 100, C) # (4,100,384)
    3. k_windows = torch.cat([shortcut, k_windows], dim=1) # (4,109,384)
    4. k_windows = self.norm1(k_windows.reshape(B, -1, C)).reshape(-1, 100+self.multi_scale.num_samples, C) # (4,109,384)


     公式1

    前边的对应

    1. # P
    2. shortcut = self.multi_scale(windows)
    3. # MHA(P, X, X)
    4. pool_x = self.norm1(shortcut.reshape(B, -1, C)).reshape(-1, self.multi_scale.num_samples, C)
    5. if self.shortcut:
    6. x = shortcut + self.drop_path(self.layer_scale_1 * self.attn(pool_x, k_windows))

    中间省略了Norm层,所以括号里的 P是 有Norm的,外面的P是 shortcut

    后边的对应

    x = x + self.drop_path(self.layer_scale_2 * self.mlp(self.norm2(x)))  # (1,36,384)

    对应

    1. elif i == 2:
    2. if self.use_global:
    3. semantic_token = blk(semantic_token+self.semantic_token2, torch.cat([semantic_token, x], dim=1))
    4. else: # True
    5. semantic_token = blk(semantic_token, torch.cat([semantic_token, x], dim=1))

     文中的

    定义为(当只有 use_global时才使用)

    1. if self.use_global:
    2. self.semantic_token2 = nn.Parameter(torch.zeros(1, self.num_samples, embed_dim))
    3. trunc_normal_(self.semantic_token2, std=.02)

    最终的对应

    1. x = shortcut + self.drop_path(self.layer_scale_1 * attn)
    2. x = x + self.drop_path(self.layer_scale_2 * self.mlp(self.norm2(x)))

     注意,在 i=1 到 i=5之间的层是 STGM,当i=5时,开始了哑铃的另一侧

    对应代码

    1. elif i == 5:
    2. x = blk(x, semantic_token) # to layers.py--132

    如图中的蓝线,原始的 image token作为Q,然后STGM的语义令牌作为KV,


    上述过程循环往复,就组成了多个的哑铃结构 

    1. if i == 0:
    2. x = blk(x) # (1,196,384) to swin_transformer -- 242
    3. elif i == 1:
    4. semantic_token = blk(x) # to layers.py --179
    5. elif i == 2:
    6. if self.use_global: # True
    7. semantic_token = blk(semantic_token+self.semantic_token2, torch.cat([semantic_token, x], dim=1)) # to layers.py--132
    8. else: # True
    9. semantic_token = blk(semantic_token, torch.cat([semantic_token, x], dim=1)) # to layers.py--132
    10. elif i > 2 and i < 5:
    11. semantic_token = blk(semantic_token) # to layers.py--132
    12. elif i == 5:
    13. x = blk(x, semantic_token) # to layers.py--132
    14. elif i == 6:
    15. x = blk(x)
    16. elif i == 7:
    17. semantic_token = blk(x)
    18. elif i == 8:
    19. semantic_token = blk(semantic_token, torch.cat([semantic_token, x], dim=1))
    20. elif i > 8 and i < 11:
    21. semantic_token = blk(semantic_token)
    22. elif i == 11:
    23. x = blk(x, semantic_token)
    24. elif i == 12:
    25. x = blk(x)
    26. elif i == 13:
    27. semantic_token = blk(x)
    28. elif i == 14:
    29. semantic_token = blk(semantic_token, torch.cat([semantic_token, x], dim=1))
    30. elif i > 14 and i < 17:
    31. semantic_token = blk(semantic_token)
    32. else:
    33. x = blk(x, semantic_token)

    tiny

    1. SwinTransformer(
    2. (patch_embed): PatchEmbed(
    3. (proj): Sequential(
    4. (0): Conv2d_BN(
    5. (c): Conv2d(3, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    6. (bn): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    7. )
    8. (1): Hardswish()
    9. (2): Conv2d_BN(
    10. (c): Conv2d(48, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    11. (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    12. )
    13. (3): Hardswish()
    14. )
    15. )
    16. (pos_drop): Dropout(p=0.0, inplace=False)
    17. (layers): ModuleList(
    18. (0): BasicLayer(
    19. dim=96, input_resolution=(56, 56), depth=2
    20. (blocks): ModuleList(
    21. (0): SwinTransformerBlock(
    22. dim=96, input_resolution=(56, 56), num_heads=3, window_size=7, shift_size=0, mlp_ratio=4.0
    23. (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    24. (attn): WindowAttention(
    25. dim=96, window_size=(7, 7), num_heads=3
    26. (qkv): Linear(in_features=96, out_features=288, bias=True)
    27. (attn_drop): Dropout(p=0.0, inplace=False)
    28. (proj): Linear(in_features=96, out_features=96, bias=True)
    29. (proj_drop): Dropout(p=0.0, inplace=False)
    30. (softmax): Softmax(dim=-1)
    31. )
    32. (drop_path): Identity()
    33. (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    34. (mlp): Mlp(
    35. (fc1): Linear(in_features=96, out_features=384, bias=True)
    36. (act): GELU()
    37. (fc2): Linear(in_features=384, out_features=96, bias=True)
    38. (drop): Dropout(p=0.0, inplace=False)
    39. )
    40. )
    41. (1): SwinTransformerBlock(
    42. dim=96, input_resolution=(56, 56), num_heads=3, window_size=7, shift_size=3, mlp_ratio=4.0
    43. (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    44. (attn): WindowAttention(
    45. dim=96, window_size=(7, 7), num_heads=3
    46. (qkv): Linear(in_features=96, out_features=288, bias=True)
    47. (attn_drop): Dropout(p=0.0, inplace=False)
    48. (proj): Linear(in_features=96, out_features=96, bias=True)
    49. (proj_drop): Dropout(p=0.0, inplace=False)
    50. (softmax): Softmax(dim=-1)
    51. )
    52. (drop_path): DropPath(drop_prob=0.018)
    53. (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    54. (mlp): Mlp(
    55. (fc1): Linear(in_features=96, out_features=384, bias=True)
    56. (act): GELU()
    57. (fc2): Linear(in_features=384, out_features=96, bias=True)
    58. (drop): Dropout(p=0.0, inplace=False)
    59. )
    60. )
    61. )
    62. (downsample): PatchMerging(
    63. input_resolution=(56, 56), dim=96
    64. (reduction): Linear(in_features=384, out_features=192, bias=False)
    65. (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    66. )
    67. )
    68. (1): BasicLayer(
    69. dim=192, input_resolution=(28, 28), depth=2
    70. (blocks): ModuleList(
    71. (0): SwinTransformerBlock(
    72. dim=192, input_resolution=(28, 28), num_heads=6, window_size=7, shift_size=0, mlp_ratio=4.0
    73. (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    74. (attn): WindowAttention(
    75. dim=192, window_size=(7, 7), num_heads=6
    76. (qkv): Linear(in_features=192, out_features=576, bias=True)
    77. (attn_drop): Dropout(p=0.0, inplace=False)
    78. (proj): Linear(in_features=192, out_features=192, bias=True)
    79. (proj_drop): Dropout(p=0.0, inplace=False)
    80. (softmax): Softmax(dim=-1)
    81. )
    82. (drop_path): DropPath(drop_prob=0.036)
    83. (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    84. (mlp): Mlp(
    85. (fc1): Linear(in_features=192, out_features=768, bias=True)
    86. (act): GELU()
    87. (fc2): Linear(in_features=768, out_features=192, bias=True)
    88. (drop): Dropout(p=0.0, inplace=False)
    89. )
    90. )
    91. (1): SwinTransformerBlock(
    92. dim=192, input_resolution=(28, 28), num_heads=6, window_size=7, shift_size=3, mlp_ratio=4.0
    93. (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    94. (attn): WindowAttention(
    95. dim=192, window_size=(7, 7), num_heads=6
    96. (qkv): Linear(in_features=192, out_features=576, bias=True)
    97. (attn_drop): Dropout(p=0.0, inplace=False)
    98. (proj): Linear(in_features=192, out_features=192, bias=True)
    99. (proj_drop): Dropout(p=0.0, inplace=False)
    100. (softmax): Softmax(dim=-1)
    101. )
    102. (drop_path): DropPath(drop_prob=0.055)
    103. (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    104. (mlp): Mlp(
    105. (fc1): Linear(in_features=192, out_features=768, bias=True)
    106. (act): GELU()
    107. (fc2): Linear(in_features=768, out_features=192, bias=True)
    108. (drop): Dropout(p=0.0, inplace=False)
    109. )
    110. )
    111. )
    112. (downsample): PatchMerging(
    113. input_resolution=(28, 28), dim=192
    114. (reduction): Linear(in_features=768, out_features=384, bias=False)
    115. (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    116. )
    117. )
    118. (2): Deit(
    119. (blocks): ModuleList(
    120. (0): SwinTransformerBlock(
    121. dim=384, input_resolution=(14, 14), num_heads=12, window_size=7, shift_size=0, mlp_ratio=4.0
    122. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    123. (attn): WindowAttention(
    124. dim=384, window_size=(7, 7), num_heads=12
    125. (qkv): Linear(in_features=384, out_features=1152, bias=True)
    126. (attn_drop): Dropout(p=0.0, inplace=False)
    127. (proj): Linear(in_features=384, out_features=384, bias=True)
    128. (proj_drop): Dropout(p=0.0, inplace=False)
    129. (softmax): Softmax(dim=-1)
    130. )
    131. (drop_path): DropPath(drop_prob=0.073)
    132. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    133. (mlp): Mlp(
    134. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    135. (act): GELU()
    136. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    137. (drop): Dropout(p=0.0, inplace=False)
    138. )
    139. )
    140. (1): SemanticAttentionBlock(
    141. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    142. (multi_scale): multi_scale_semantic_token1()
    143. (attn): Attention(
    144. (q): Linear(in_features=384, out_features=384, bias=True)
    145. (kv): Linear(in_features=384, out_features=768, bias=True)
    146. (attn_drop): Dropout(p=0.0, inplace=False)
    147. (proj): Linear(in_features=384, out_features=384, bias=True)
    148. (proj_drop): Dropout(p=0.0, inplace=False)
    149. )
    150. (drop_path): DropPath(drop_prob=0.091)
    151. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    152. (mlp): Mlp(
    153. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    154. (act): GELU()
    155. (drop1): Dropout(p=0.0, inplace=False)
    156. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    157. (drop2): Dropout(p=0.0, inplace=False)
    158. )
    159. )
    160. (2): Block(
    161. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    162. (attn): Attention(
    163. (q): Linear(in_features=384, out_features=384, bias=True)
    164. (kv): Linear(in_features=384, out_features=768, bias=True)
    165. (attn_drop): Dropout(p=0.0, inplace=False)
    166. (proj): Linear(in_features=384, out_features=384, bias=True)
    167. (proj_drop): Dropout(p=0.0, inplace=False)
    168. )
    169. (drop_path): DropPath(drop_prob=0.109)
    170. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    171. (mlp): Mlp(
    172. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    173. (act): GELU()
    174. (drop1): Dropout(p=0.0, inplace=False)
    175. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    176. (drop2): Dropout(p=0.0, inplace=False)
    177. )
    178. )
    179. (3): Block(
    180. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    181. (attn): Attention(
    182. (q): Linear(in_features=384, out_features=384, bias=True)
    183. (kv): Linear(in_features=384, out_features=768, bias=True)
    184. (attn_drop): Dropout(p=0.0, inplace=False)
    185. (proj): Linear(in_features=384, out_features=384, bias=True)
    186. (proj_drop): Dropout(p=0.0, inplace=False)
    187. )
    188. (drop_path): DropPath(drop_prob=0.127)
    189. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    190. (mlp): Mlp(
    191. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    192. (act): GELU()
    193. (drop1): Dropout(p=0.0, inplace=False)
    194. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    195. (drop2): Dropout(p=0.0, inplace=False)
    196. )
    197. )
    198. (4): Block(
    199. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    200. (attn): Attention(
    201. (q): Linear(in_features=384, out_features=384, bias=True)
    202. (kv): Linear(in_features=384, out_features=768, bias=True)
    203. (attn_drop): Dropout(p=0.0, inplace=False)
    204. (proj): Linear(in_features=384, out_features=384, bias=True)
    205. (proj_drop): Dropout(p=0.0, inplace=False)
    206. )
    207. (drop_path): DropPath(drop_prob=0.145)
    208. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    209. (mlp): Mlp(
    210. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    211. (act): GELU()
    212. (drop1): Dropout(p=0.0, inplace=False)
    213. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    214. (drop2): Dropout(p=0.0, inplace=False)
    215. )
    216. )
    217. (5): Block(
    218. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    219. (attn): Attention(
    220. (q): Linear(in_features=384, out_features=384, bias=True)
    221. (kv): Linear(in_features=384, out_features=768, bias=True)
    222. (attn_drop): Dropout(p=0.0, inplace=False)
    223. (proj): Linear(in_features=384, out_features=384, bias=True)
    224. (proj_drop): Dropout(p=0.0, inplace=False)
    225. )
    226. (drop_path): DropPath(drop_prob=0.164)
    227. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    228. (mlp): Mlp(
    229. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    230. (act): GELU()
    231. (drop1): Dropout(p=0.0, inplace=False)
    232. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    233. (drop2): Dropout(p=0.0, inplace=False)
    234. )
    235. )
    236. )
    237. (downsample): PatchMerging(
    238. input_resolution=(14, 14), dim=384
    239. (reduction): Linear(in_features=1536, out_features=768, bias=False)
    240. (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
    241. )
    242. )
    243. (3): BasicLayer(
    244. dim=768, input_resolution=(7, 7), depth=2
    245. (blocks): ModuleList(
    246. (0): SwinTransformerBlock(
    247. dim=768, input_resolution=(7, 7), num_heads=24, window_size=7, shift_size=0, mlp_ratio=4.0
    248. (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    249. (attn): WindowAttention(
    250. dim=768, window_size=(7, 7), num_heads=24
    251. (qkv): Linear(in_features=768, out_features=2304, bias=True)
    252. (attn_drop): Dropout(p=0.0, inplace=False)
    253. (proj): Linear(in_features=768, out_features=768, bias=True)
    254. (proj_drop): Dropout(p=0.0, inplace=False)
    255. (softmax): Softmax(dim=-1)
    256. )
    257. (drop_path): DropPath(drop_prob=0.182)
    258. (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    259. (mlp): Mlp(
    260. (fc1): Linear(in_features=768, out_features=3072, bias=True)
    261. (act): GELU()
    262. (fc2): Linear(in_features=3072, out_features=768, bias=True)
    263. (drop): Dropout(p=0.0, inplace=False)
    264. )
    265. )
    266. (1): SwinTransformerBlock(
    267. dim=768, input_resolution=(7, 7), num_heads=24, window_size=7, shift_size=0, mlp_ratio=4.0
    268. (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    269. (attn): WindowAttention(
    270. dim=768, window_size=(7, 7), num_heads=24
    271. (qkv): Linear(in_features=768, out_features=2304, bias=True)
    272. (attn_drop): Dropout(p=0.0, inplace=False)
    273. (proj): Linear(in_features=768, out_features=768, bias=True)
    274. (proj_drop): Dropout(p=0.0, inplace=False)
    275. (softmax): Softmax(dim=-1)
    276. )
    277. (drop_path): DropPath(drop_prob=0.200)
    278. (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    279. (mlp): Mlp(
    280. (fc1): Linear(in_features=768, out_features=3072, bias=True)
    281. (act): GELU()
    282. (fc2): Linear(in_features=3072, out_features=768, bias=True)
    283. (drop): Dropout(p=0.0, inplace=False)
    284. )
    285. )
    286. )
    287. )
    288. )
    289. (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    290. (avgpool): AdaptiveAvgPool1d(output_size=1)
    291. (head): Linear(in_features=768, out_features=100, bias=True)
    292. )

    网络结构

    1. SwinTransformer(
    2. (patch_embed): PatchEmbed(
    3. (proj): Sequential(
    4. (0): Conv2d_BN(
    5. (c): Conv2d(3, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    6. (bn): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    7. )
    8. (1): Hardswish()
    9. (2): Conv2d_BN(
    10. (c): Conv2d(48, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    11. (bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    12. )
    13. (3): Hardswish()
    14. )
    15. )
    16. (pos_drop): Dropout(p=0.0, inplace=False)
    17. (layers): ModuleList(
    18. (0): BasicLayer(
    19. dim=96, input_resolution=(56, 56), depth=2
    20. (blocks): ModuleList(
    21. (0): SwinTransformerBlock(
    22. dim=96, input_resolution=(56, 56), num_heads=3, window_size=7, shift_size=0, mlp_ratio=4.0
    23. (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    24. (attn): WindowAttention(
    25. dim=96, window_size=(7, 7), num_heads=3
    26. (qkv): Linear(in_features=96, out_features=288, bias=True)
    27. (attn_drop): Dropout(p=0.0, inplace=False)
    28. (proj): Linear(in_features=96, out_features=96, bias=True)
    29. (proj_drop): Dropout(p=0.0, inplace=False)
    30. (softmax): Softmax(dim=-1)
    31. )
    32. (drop_path): Identity()
    33. (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    34. (mlp): Mlp(
    35. (fc1): Linear(in_features=96, out_features=384, bias=True)
    36. (act): GELU()
    37. (fc2): Linear(in_features=384, out_features=96, bias=True)
    38. (drop): Dropout(p=0.0, inplace=False)
    39. )
    40. )
    41. (1): SwinTransformerBlock(
    42. dim=96, input_resolution=(56, 56), num_heads=3, window_size=7, shift_size=3, mlp_ratio=4.0
    43. (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    44. (attn): WindowAttention(
    45. dim=96, window_size=(7, 7), num_heads=3
    46. (qkv): Linear(in_features=96, out_features=288, bias=True)
    47. (attn_drop): Dropout(p=0.0, inplace=False)
    48. (proj): Linear(in_features=96, out_features=96, bias=True)
    49. (proj_drop): Dropout(p=0.0, inplace=False)
    50. (softmax): Softmax(dim=-1)
    51. )
    52. (drop_path): DropPath(drop_prob=0.013)
    53. (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    54. (mlp): Mlp(
    55. (fc1): Linear(in_features=96, out_features=384, bias=True)
    56. (act): GELU()
    57. (fc2): Linear(in_features=384, out_features=96, bias=True)
    58. (drop): Dropout(p=0.0, inplace=False)
    59. )
    60. )
    61. )
    62. (downsample): PatchMerging(
    63. input_resolution=(56, 56), dim=96
    64. (reduction): Linear(in_features=384, out_features=192, bias=False)
    65. (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    66. )
    67. )
    68. (1): BasicLayer(
    69. dim=192, input_resolution=(28, 28), depth=2
    70. (blocks): ModuleList(
    71. (0): SwinTransformerBlock(
    72. dim=192, input_resolution=(28, 28), num_heads=6, window_size=7, shift_size=0, mlp_ratio=4.0
    73. (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    74. (attn): WindowAttention(
    75. dim=192, window_size=(7, 7), num_heads=6
    76. (qkv): Linear(in_features=192, out_features=576, bias=True)
    77. (attn_drop): Dropout(p=0.0, inplace=False)
    78. (proj): Linear(in_features=192, out_features=192, bias=True)
    79. (proj_drop): Dropout(p=0.0, inplace=False)
    80. (softmax): Softmax(dim=-1)
    81. )
    82. (drop_path): DropPath(drop_prob=0.026)
    83. (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    84. (mlp): Mlp(
    85. (fc1): Linear(in_features=192, out_features=768, bias=True)
    86. (act): GELU()
    87. (fc2): Linear(in_features=768, out_features=192, bias=True)
    88. (drop): Dropout(p=0.0, inplace=False)
    89. )
    90. )
    91. (1): SwinTransformerBlock(
    92. dim=192, input_resolution=(28, 28), num_heads=6, window_size=7, shift_size=3, mlp_ratio=4.0
    93. (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    94. (attn): WindowAttention(
    95. dim=192, window_size=(7, 7), num_heads=6
    96. (qkv): Linear(in_features=192, out_features=576, bias=True)
    97. (attn_drop): Dropout(p=0.0, inplace=False)
    98. (proj): Linear(in_features=192, out_features=192, bias=True)
    99. (proj_drop): Dropout(p=0.0, inplace=False)
    100. (softmax): Softmax(dim=-1)
    101. )
    102. (drop_path): DropPath(drop_prob=0.039)
    103. (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    104. (mlp): Mlp(
    105. (fc1): Linear(in_features=192, out_features=768, bias=True)
    106. (act): GELU()
    107. (fc2): Linear(in_features=768, out_features=192, bias=True)
    108. (drop): Dropout(p=0.0, inplace=False)
    109. )
    110. )
    111. )
    112. (downsample): PatchMerging(
    113. input_resolution=(28, 28), dim=192
    114. (reduction): Linear(in_features=768, out_features=384, bias=False)
    115. (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    116. )
    117. )
    118. (2): Deit(
    119. (blocks): ModuleList(
    120. (0): SwinTransformerBlock(
    121. dim=384, input_resolution=(14, 14), num_heads=12, window_size=7, shift_size=0, mlp_ratio=4.0
    122. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    123. (attn): WindowAttention(
    124. dim=384, window_size=(7, 7), num_heads=12
    125. (qkv): Linear(in_features=384, out_features=1152, bias=True)
    126. (attn_drop): Dropout(p=0.0, inplace=False)
    127. (proj): Linear(in_features=384, out_features=384, bias=True)
    128. (proj_drop): Dropout(p=0.0, inplace=False)
    129. (softmax): Softmax(dim=-1)
    130. )
    131. (drop_path): DropPath(drop_prob=0.052)
    132. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    133. (mlp): Mlp(
    134. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    135. (act): GELU()
    136. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    137. (drop): Dropout(p=0.0, inplace=False)
    138. )
    139. )
    140. (1): SemanticAttentionBlock(
    141. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    142. (multi_scale): multi_scale_semantic_token1()
    143. (attn): Attention(
    144. (q): Linear(in_features=384, out_features=384, bias=True)
    145. (kv): Linear(in_features=384, out_features=768, bias=True)
    146. (attn_drop): Dropout(p=0.0, inplace=False)
    147. (proj): Linear(in_features=384, out_features=384, bias=True)
    148. (proj_drop): Dropout(p=0.0, inplace=False)
    149. )
    150. (drop_path): DropPath(drop_prob=0.065)
    151. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    152. (mlp): Mlp(
    153. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    154. (act): GELU()
    155. (drop1): Dropout(p=0.0, inplace=False)
    156. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    157. (drop2): Dropout(p=0.0, inplace=False)
    158. )
    159. )
    160. (2): Block(
    161. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    162. (attn): Attention(
    163. (q): Linear(in_features=384, out_features=384, bias=True)
    164. (kv): Linear(in_features=384, out_features=768, bias=True)
    165. (attn_drop): Dropout(p=0.0, inplace=False)
    166. (proj): Linear(in_features=384, out_features=384, bias=True)
    167. (proj_drop): Dropout(p=0.0, inplace=False)
    168. )
    169. (drop_path): DropPath(drop_prob=0.078)
    170. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    171. (mlp): Mlp(
    172. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    173. (act): GELU()
    174. (drop1): Dropout(p=0.0, inplace=False)
    175. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    176. (drop2): Dropout(p=0.0, inplace=False)
    177. )
    178. )
    179. (3): Block(
    180. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    181. (attn): Attention(
    182. (q): Linear(in_features=384, out_features=384, bias=True)
    183. (kv): Linear(in_features=384, out_features=768, bias=True)
    184. (attn_drop): Dropout(p=0.0, inplace=False)
    185. (proj): Linear(in_features=384, out_features=384, bias=True)
    186. (proj_drop): Dropout(p=0.0, inplace=False)
    187. )
    188. (drop_path): DropPath(drop_prob=0.091)
    189. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    190. (mlp): Mlp(
    191. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    192. (act): GELU()
    193. (drop1): Dropout(p=0.0, inplace=False)
    194. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    195. (drop2): Dropout(p=0.0, inplace=False)
    196. )
    197. )
    198. (4): Block(
    199. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    200. (attn): Attention(
    201. (q): Linear(in_features=384, out_features=384, bias=True)
    202. (kv): Linear(in_features=384, out_features=768, bias=True)
    203. (attn_drop): Dropout(p=0.0, inplace=False)
    204. (proj): Linear(in_features=384, out_features=384, bias=True)
    205. (proj_drop): Dropout(p=0.0, inplace=False)
    206. )
    207. (drop_path): DropPath(drop_prob=0.104)
    208. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    209. (mlp): Mlp(
    210. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    211. (act): GELU()
    212. (drop1): Dropout(p=0.0, inplace=False)
    213. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    214. (drop2): Dropout(p=0.0, inplace=False)
    215. )
    216. )
    217. (5): Block(
    218. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    219. (attn): Attention(
    220. (q): Linear(in_features=384, out_features=384, bias=True)
    221. (kv): Linear(in_features=384, out_features=768, bias=True)
    222. (attn_drop): Dropout(p=0.0, inplace=False)
    223. (proj): Linear(in_features=384, out_features=384, bias=True)
    224. (proj_drop): Dropout(p=0.0, inplace=False)
    225. )
    226. (drop_path): DropPath(drop_prob=0.117)
    227. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    228. (mlp): Mlp(
    229. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    230. (act): GELU()
    231. (drop1): Dropout(p=0.0, inplace=False)
    232. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    233. (drop2): Dropout(p=0.0, inplace=False)
    234. )
    235. )
    236. (6): SwinTransformerBlock(
    237. dim=384, input_resolution=(14, 14), num_heads=12, window_size=7, shift_size=0, mlp_ratio=4.0
    238. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    239. (attn): WindowAttention(
    240. dim=384, window_size=(7, 7), num_heads=12
    241. (qkv): Linear(in_features=384, out_features=1152, bias=True)
    242. (attn_drop): Dropout(p=0.0, inplace=False)
    243. (proj): Linear(in_features=384, out_features=384, bias=True)
    244. (proj_drop): Dropout(p=0.0, inplace=False)
    245. (softmax): Softmax(dim=-1)
    246. )
    247. (drop_path): DropPath(drop_prob=0.130)
    248. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    249. (mlp): Mlp(
    250. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    251. (act): GELU()
    252. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    253. (drop): Dropout(p=0.0, inplace=False)
    254. )
    255. )
    256. (7): SemanticAttentionBlock(
    257. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    258. (multi_scale): multi_scale_semantic_token1()
    259. (attn): Attention(
    260. (q): Linear(in_features=384, out_features=384, bias=True)
    261. (kv): Linear(in_features=384, out_features=768, bias=True)
    262. (attn_drop): Dropout(p=0.0, inplace=False)
    263. (proj): Linear(in_features=384, out_features=384, bias=True)
    264. (proj_drop): Dropout(p=0.0, inplace=False)
    265. )
    266. (drop_path): DropPath(drop_prob=0.143)
    267. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    268. (mlp): Mlp(
    269. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    270. (act): GELU()
    271. (drop1): Dropout(p=0.0, inplace=False)
    272. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    273. (drop2): Dropout(p=0.0, inplace=False)
    274. )
    275. )
    276. (8): Block(
    277. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    278. (attn): Attention(
    279. (q): Linear(in_features=384, out_features=384, bias=True)
    280. (kv): Linear(in_features=384, out_features=768, bias=True)
    281. (attn_drop): Dropout(p=0.0, inplace=False)
    282. (proj): Linear(in_features=384, out_features=384, bias=True)
    283. (proj_drop): Dropout(p=0.0, inplace=False)
    284. )
    285. (drop_path): DropPath(drop_prob=0.157)
    286. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    287. (mlp): Mlp(
    288. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    289. (act): GELU()
    290. (drop1): Dropout(p=0.0, inplace=False)
    291. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    292. (drop2): Dropout(p=0.0, inplace=False)
    293. )
    294. )
    295. (9): Block(
    296. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    297. (attn): Attention(
    298. (q): Linear(in_features=384, out_features=384, bias=True)
    299. (kv): Linear(in_features=384, out_features=768, bias=True)
    300. (attn_drop): Dropout(p=0.0, inplace=False)
    301. (proj): Linear(in_features=384, out_features=384, bias=True)
    302. (proj_drop): Dropout(p=0.0, inplace=False)
    303. )
    304. (drop_path): DropPath(drop_prob=0.170)
    305. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    306. (mlp): Mlp(
    307. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    308. (act): GELU()
    309. (drop1): Dropout(p=0.0, inplace=False)
    310. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    311. (drop2): Dropout(p=0.0, inplace=False)
    312. )
    313. )
    314. (10): Block(
    315. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    316. (attn): Attention(
    317. (q): Linear(in_features=384, out_features=384, bias=True)
    318. (kv): Linear(in_features=384, out_features=768, bias=True)
    319. (attn_drop): Dropout(p=0.0, inplace=False)
    320. (proj): Linear(in_features=384, out_features=384, bias=True)
    321. (proj_drop): Dropout(p=0.0, inplace=False)
    322. )
    323. (drop_path): DropPath(drop_prob=0.183)
    324. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    325. (mlp): Mlp(
    326. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    327. (act): GELU()
    328. (drop1): Dropout(p=0.0, inplace=False)
    329. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    330. (drop2): Dropout(p=0.0, inplace=False)
    331. )
    332. )
    333. (11): Block(
    334. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    335. (attn): Attention(
    336. (q): Linear(in_features=384, out_features=384, bias=True)
    337. (kv): Linear(in_features=384, out_features=768, bias=True)
    338. (attn_drop): Dropout(p=0.0, inplace=False)
    339. (proj): Linear(in_features=384, out_features=384, bias=True)
    340. (proj_drop): Dropout(p=0.0, inplace=False)
    341. )
    342. (drop_path): DropPath(drop_prob=0.196)
    343. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    344. (mlp): Mlp(
    345. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    346. (act): GELU()
    347. (drop1): Dropout(p=0.0, inplace=False)
    348. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    349. (drop2): Dropout(p=0.0, inplace=False)
    350. )
    351. )
    352. (12): SwinTransformerBlock(
    353. dim=384, input_resolution=(14, 14), num_heads=12, window_size=7, shift_size=0, mlp_ratio=4.0
    354. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    355. (attn): WindowAttention(
    356. dim=384, window_size=(7, 7), num_heads=12
    357. (qkv): Linear(in_features=384, out_features=1152, bias=True)
    358. (attn_drop): Dropout(p=0.0, inplace=False)
    359. (proj): Linear(in_features=384, out_features=384, bias=True)
    360. (proj_drop): Dropout(p=0.0, inplace=False)
    361. (softmax): Softmax(dim=-1)
    362. )
    363. (drop_path): DropPath(drop_prob=0.209)
    364. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    365. (mlp): Mlp(
    366. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    367. (act): GELU()
    368. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    369. (drop): Dropout(p=0.0, inplace=False)
    370. )
    371. )
    372. (13): SemanticAttentionBlock(
    373. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    374. (multi_scale): multi_scale_semantic_token1()
    375. (attn): Attention(
    376. (q): Linear(in_features=384, out_features=384, bias=True)
    377. (kv): Linear(in_features=384, out_features=768, bias=True)
    378. (attn_drop): Dropout(p=0.0, inplace=False)
    379. (proj): Linear(in_features=384, out_features=384, bias=True)
    380. (proj_drop): Dropout(p=0.0, inplace=False)
    381. )
    382. (drop_path): DropPath(drop_prob=0.222)
    383. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    384. (mlp): Mlp(
    385. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    386. (act): GELU()
    387. (drop1): Dropout(p=0.0, inplace=False)
    388. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    389. (drop2): Dropout(p=0.0, inplace=False)
    390. )
    391. )
    392. (14): Block(
    393. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    394. (attn): Attention(
    395. (q): Linear(in_features=384, out_features=384, bias=True)
    396. (kv): Linear(in_features=384, out_features=768, bias=True)
    397. (attn_drop): Dropout(p=0.0, inplace=False)
    398. (proj): Linear(in_features=384, out_features=384, bias=True)
    399. (proj_drop): Dropout(p=0.0, inplace=False)
    400. )
    401. (drop_path): DropPath(drop_prob=0.235)
    402. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    403. (mlp): Mlp(
    404. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    405. (act): GELU()
    406. (drop1): Dropout(p=0.0, inplace=False)
    407. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    408. (drop2): Dropout(p=0.0, inplace=False)
    409. )
    410. )
    411. (15): Block(
    412. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    413. (attn): Attention(
    414. (q): Linear(in_features=384, out_features=384, bias=True)
    415. (kv): Linear(in_features=384, out_features=768, bias=True)
    416. (attn_drop): Dropout(p=0.0, inplace=False)
    417. (proj): Linear(in_features=384, out_features=384, bias=True)
    418. (proj_drop): Dropout(p=0.0, inplace=False)
    419. )
    420. (drop_path): DropPath(drop_prob=0.248)
    421. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    422. (mlp): Mlp(
    423. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    424. (act): GELU()
    425. (drop1): Dropout(p=0.0, inplace=False)
    426. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    427. (drop2): Dropout(p=0.0, inplace=False)
    428. )
    429. )
    430. (16): Block(
    431. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    432. (attn): Attention(
    433. (q): Linear(in_features=384, out_features=384, bias=True)
    434. (kv): Linear(in_features=384, out_features=768, bias=True)
    435. (attn_drop): Dropout(p=0.0, inplace=False)
    436. (proj): Linear(in_features=384, out_features=384, bias=True)
    437. (proj_drop): Dropout(p=0.0, inplace=False)
    438. )
    439. (drop_path): DropPath(drop_prob=0.261)
    440. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    441. (mlp): Mlp(
    442. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    443. (act): GELU()
    444. (drop1): Dropout(p=0.0, inplace=False)
    445. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    446. (drop2): Dropout(p=0.0, inplace=False)
    447. )
    448. )
    449. (17): Block(
    450. (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    451. (attn): Attention(
    452. (q): Linear(in_features=384, out_features=384, bias=True)
    453. (kv): Linear(in_features=384, out_features=768, bias=True)
    454. (attn_drop): Dropout(p=0.0, inplace=False)
    455. (proj): Linear(in_features=384, out_features=384, bias=True)
    456. (proj_drop): Dropout(p=0.0, inplace=False)
    457. )
    458. (drop_path): DropPath(drop_prob=0.274)
    459. (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    460. (mlp): Mlp(
    461. (fc1): Linear(in_features=384, out_features=1536, bias=True)
    462. (act): GELU()
    463. (drop1): Dropout(p=0.0, inplace=False)
    464. (fc2): Linear(in_features=1536, out_features=384, bias=True)
    465. (drop2): Dropout(p=0.0, inplace=False)
    466. )
    467. )
    468. )
    469. (downsample): PatchMerging(
    470. input_resolution=(14, 14), dim=384
    471. (reduction): Linear(in_features=1536, out_features=768, bias=False)
    472. (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
    473. )
    474. )
    475. (3): BasicLayer(
    476. dim=768, input_resolution=(7, 7), depth=2
    477. (blocks): ModuleList(
    478. (0): SwinTransformerBlock(
    479. dim=768, input_resolution=(7, 7), num_heads=24, window_size=7, shift_size=0, mlp_ratio=4.0
    480. (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    481. (attn): WindowAttention(
    482. dim=768, window_size=(7, 7), num_heads=24
    483. (qkv): Linear(in_features=768, out_features=2304, bias=True)
    484. (attn_drop): Dropout(p=0.0, inplace=False)
    485. (proj): Linear(in_features=768, out_features=768, bias=True)
    486. (proj_drop): Dropout(p=0.0, inplace=False)
    487. (softmax): Softmax(dim=-1)
    488. )
    489. (drop_path): DropPath(drop_prob=0.287)
    490. (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    491. (mlp): Mlp(
    492. (fc1): Linear(in_features=768, out_features=3072, bias=True)
    493. (act): GELU()
    494. (fc2): Linear(in_features=3072, out_features=768, bias=True)
    495. (drop): Dropout(p=0.0, inplace=False)
    496. )
    497. )
    498. (1): SwinTransformerBlock(
    499. dim=768, input_resolution=(7, 7), num_heads=24, window_size=7, shift_size=0, mlp_ratio=4.0
    500. (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    501. (attn): WindowAttention(
    502. dim=768, window_size=(7, 7), num_heads=24
    503. (qkv): Linear(in_features=768, out_features=2304, bias=True)
    504. (attn_drop): Dropout(p=0.0, inplace=False)
    505. (proj): Linear(in_features=768, out_features=768, bias=True)
    506. (proj_drop): Dropout(p=0.0, inplace=False)
    507. (softmax): Softmax(dim=-1)
    508. )
    509. (drop_path): DropPath(drop_prob=0.300)
    510. (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    511. (mlp): Mlp(
    512. (fc1): Linear(in_features=768, out_features=3072, bias=True)
    513. (act): GELU()
    514. (fc2): Linear(in_features=3072, out_features=768, bias=True)
    515. (drop): Dropout(p=0.0, inplace=False)
    516. )
    517. )
    518. )
    519. )
    520. )
    521. (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    522. (avgpool): AdaptiveAvgPool1d(output_size=1)
    523. (head): Linear(in_features=768, out_features=100, bias=True)
    524. )

  • 相关阅读:
    【MySQL】根据查询结果更新统计值
    java EE 多线程(一)
    【Java】JDK里有哪些线程安全的Set?
    Vite: 新一代高效的前端构建工具
    2023北航全球科创大赛启动,首次设立面向在校生群体的新芽赛道
    在报表开发工具Stimulsoft Report数据透视表的新功能介绍
    CSDN---Markdown编辑器:基本语法知识
    第一季:导航【Java面试题】
    权限管理系统-0.4.0
    Gemmini测试test文件chisel源码详解(三)
  • 原文地址:https://blog.csdn.net/allrubots/article/details/132636017