• 风格迁移adaIN 和iT的adaLN


    BN、LN、IN、GN的区别

    • BatchNorm:batch方向做归一化,算NxHxW的均值,对小batchsize效果不好;BN主要缺点是对batchsize的大小比较敏感,由于每次计算均值和方差是在一个batch上,所以如果batchsize太小,则计算的均值、方差不足以代表整个数据分布。

    • LayerNorm:channel方向做归一化,算CxHxW的均值,主要对RNN(处理序列)作用明显,目前大火的Transformer也是使用的这种归一化操作;

    • InstanceNorm:一个channel内做归一化,算H*W的均值,用在风格化迁移;因为在图像风格化中,生成结果主要依赖于某个图像实例,所以对整个batch归一化不适合图像风格化中,因而对HW做归一化。可以加速模型收敛,并且保持每个图像实例之间的独立。

    • GroupNorm:将channel方向分group,然后每个group内做归一化,算(C//G)HW的均值;这样与batchsize无关,不受其约束,在分割与检测领域作用较好。

    图像风格迁移adaIN

    在这里插入图片描述

    • 作者做实验,对IN 和BN 的loss区别原因进行对比,发现图片计算的均值/方差代表图片的风格;因此设计AdaIN 对图片风格归一化,然后再迁移到目标风格;
    def forward(self, content, style, alpha=1.0):
        assert 0 <= alpha <= 1
        style_feats = self.encode_with_intermediate(style)
        content_feat = self.encode(content)
        t = adain(content_feat, style_feats[-1])
        t = alpha * t + (1 - alpha) * content_feat # 控制内容和风格的比例
    
        g_t = self.decoder(t)
        g_t_feats = self.encode_with_intermediate(g_t)
    
        loss_c = self.calc_content_loss(g_t_feats[-1], t)
        loss_s = self.calc_style_loss(g_t_feats[0], style_feats[0])
        for i in range(1, 4):
            loss_s += self.calc_style_loss(g_t_feats[i], style_feats[i])
        return loss_c, loss_s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • adain 只用在encoder-decoder 之间;实验发现encoder-adain-decoder(IN) 效果会变差;
    • 控制内容和风格的比例 t = alpha * t + (1 - alpha) * content_feat

    DiT adaLN

    在这里插入图片描述

    import numpy as np
    
    class LayerNorm:
        def __init__(self, epsilon=1e-6):
            self.epsilon = epsilon
    
        def __call__(self, x: np.ndarray, gamma: np.ndarray, beta: np.ndarray) -> np.ndarray:
            """
        Args:
            x (np.ndarray): shape: (batch_size, sequence_length, feature_dim)
                gamma (np.ndarray): shape: (batch_size, 1, feature_dim), generated by condition embedding
                beta (np.ndarray): shape: (batch_size, 1, feature_dim), generated by condition embedding
        return:
                x_layer_norm (np.ndarray): shape: (batch_size, sequence_length, feature_dim)
        """
            _mean = np.mean(x, axis=-1, keepdims=True)
            _std = np.var(x, axis=-1, keepdims=True)
            x_layer_norm = self.gamma * (x - _mean / (_std + self.epsilon)) + self.beta
            return x_layer_norm
    
    class DiTAdaLayerNorm:
        def __init__(self,feature_dim, epsilon=1e-6):
            self.epsilon = epsilon
            self.weight = np.random.rand(feature_dim, feature_dim * 2)
    
        def __call__(self, x, condition):
            """
            Args:
                x (np.ndarray): shape: (batch_size, sequence_length, feature_dim)
                condition (np.ndarray): shape: (batch_size, 1, feature_dim)
                    Ps: condition = time_cond_embedding + class_cond_embedding
            return:
                x_layer_norm (np.ndarray): shape: (batch_size, sequence_length, feature_dim)
            """
            affine = condition @ self.weight  # shape: (batch_size, 1, feature_dim * 2)
            gamma, beta = np.split(affine, 2, axis=-1)
            _mean = np.mean(x, axis=-1, keepdims=True)
            _std = np.var(x, axis=-1, keepdims=True)
            x_layer_norm = gamma * (x - _mean / (_std + self.epsilon)) + beta
            return x_layer_norm
    
    class DiTBlock:
        def __init__(self, feature_dim):
            self.MultiHeadSelfAttention = lambda x: x # mock multi-head self-attention
            self.layer_norm = LayerNorm()
            self.MLP = lambda x: x # mock multi-layer perceptron
            self.weight = np.random.rand(feature_dim, feature_dim * 6)
    
        def __call__(self, x: np.ndarray, time_embedding: np.ndarray, class_emnedding: np.ndarray) -> np.ndarray:
            """
            Args:
                x (np.ndarray): shape: (batch_size, sequence_length, feature_dim)
                time_embedding (np.ndarray): shape: (batch_size, 1, feature_dim)
                class_emnedding (np.ndarray): shape: (batch_size, 1, feature_dim)
            return:
                x (np.ndarray): shape: (batch_size, sequence_length, feature_dim)
            """
            condition_embedding = time_embedding + class_emnedding
            affine_params = condition_embedding @ self.weight  # shape: (batch_size, 1, feature_dim * 6)
            gamma_1, beta_1, alpha_1, gamma_2, beta_2, alpha_2 = np.split(affine_params, 6, axis=-1)
            x = x + alpha_1 * self.MultiHeadSelfAttention(self.layer_norm(x, gamma_1, beta_1))
            x = x + alpha_2 * self.MLP(self.layer_norm(x, gamma_2, beta_2))
            return x
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • class condition的引入,用adaLN替换常规的LN。adaLN和adaLN-Zero的区别在于,用于scale的均值方差是随机初始化的,还是可以训练的;
  • 相关阅读:
    Postman下发流表至Opendaylight
    Qt/自定义控件的封装
    JavaScript函数中this的指向问题讨论(普通函数和箭头函数)
    C++ - 封装 unordered_set 和 unordered_map - 哈希桶的迭代器实现
    k8s 1.22 ingress 变化
    2021中国科学院文献情报中心期刊分区表 计算机
    单片机为什么一直用C语言,不用其他编程语言?
    一个关于 i++ 和 ++i 的面试题打趴了所有人
    与英特尔分手后,苹果收获成功
    PyTorch入门之【CNN】
  • 原文地址:https://blog.csdn.net/qq_40168949/article/details/138168206