• 深度学习 - RNN训练过程推演


    1. 数据准备

    字符序列 “hello” 转换为 one-hot 编码表示:

    • 输入: [‘h’, ‘e’, ‘l’, ‘l’]
    • 输出: [‘e’, ‘l’, ‘l’, ‘o’]

    2. 初始化参数

    我们使用一个单层的 RNN(N VS N),隐藏层大小为2,每次传1个字符。初始参数如下:

    W x h = ( 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ) , W h h = ( 0.1 0.2 0.3 0.4 ) , W h y = ( 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ) W_{xh} =

    (0.10.20.30.40.50.60.70.8)" role="presentation">(0.10.20.30.40.50.60.70.8)
    , \quad W_{hh} =
    (0.10.20.30.4)" role="presentation">(0.10.20.30.4)
    , \quad W_{hy} =
    (0.10.20.30.40.50.60.70.8)" role="presentation">(0.10.20.30.40.50.60.70.8)
    Wxh= 0.10.30.50.70.20.40.60.8 ,Whh=(0.10.30.20.4),Why=(0.10.50.20.60.30.70.40.8)

    偏置项初始化为0。

    3. 前向传播和反向传播

    时间步 1(输入 ‘h’):

    输入向量 x 1 = [ 1 , 0 , 0 , 0 ] x_1 = [1, 0, 0, 0] x1=[1,0,0,0]

    h 1 = tanh ⁡ ( W x h x 1 + W h h h 0 ) = tanh ⁡ ( ( 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ) ( 1 0 0 0 ) + ( 0.1 0.2 0.3 0.4 ) ( 0 0 ) ) = tanh ⁡ ( ( 0.1 0.3 ) ) = ( 0.0997 0.2913 ) h_1 = \tanh(W_{xh} x_1 + W_{hh} h_0) = \tanh \left(

    (0.10.20.30.40.50.60.70.8)" role="presentation">(0.10.20.30.40.50.60.70.8)
    (1000)" role="presentation">(1000)
    +
    (0.10.20.30.4)" role="presentation">(0.10.20.30.4)
    (00)" role="presentation">(00)
    \right) = \tanh \left(
    (0.10.3)" role="presentation">(0.10.3)
    \right) =
    (0.09970.2913)" role="presentation">(0.09970.2913)
    h1=tanh(Wxhx1+Whhh0)=tanh 0.10.30.50.70.20.40.60.8 1000 +(0.10.30.20.4)(00) =tanh((0.10.3))=(0.09970.2913)

    y 1 = W h y h 1 = ( 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ) ( 0.0997 0.2913 ) = ( 0.1695 0.3889 0.6083 0.8277 ) y_1 = W_{hy} h_1 =

    (0.10.20.30.40.50.60.70.8)" role="presentation">(0.10.20.30.40.50.60.70.8)
    (0.09970.2913)" role="presentation">(0.09970.2913)
    =
    (0.16950.38890.60830.8277)" role="presentation">(0.16950.38890.60830.8277)
    y1=Whyh1=(0.10.50.20.60.30.70.40.8)(0.09970.2913)= 0.16950.38890.60830.8277

    预测值 y ^ 1 = softmax ( y 1 ) \hat{y}_1 = \text{softmax}(y_1) y^1=softmax(y1)

    假设真实输出为 ‘e’,对应 one-hot 编码为 y 1 = [ 0 , 1 , 0 , 0 ] y_1 = [0, 1, 0, 0] y1=[0,1,0,0]

    交叉熵损失函数:

    loss 1 = − ∑ i y 1 i log ⁡ ( y ^ 1 i ) \text{loss}_1 = - \sum_{i} y_{1i} \log(\hat{y}_{1i}) loss1=iy1ilog(y^1i)

    梯度计算:

    ∂ loss 1 ∂ W h y = ( y ^ 1 − y 1 ) h 1 T \frac{\partial \text{loss}_1}{\partial W_{hy}} = (\hat{y}_1 - y_1) h_1^T Whyloss1=(y^1y1)h1T

    ∂ loss 1 ∂ W x h = ∂ loss 1 ∂ h 1 ⋅ ∂ h 1 ∂ W x h \frac{\partial \text{loss}_1}{\partial W_{xh}} = \frac{\partial \text{loss}_1}{\partial h_1} \cdot \frac{\partial h_1}{\partial W_{xh}} Wxhloss1=h1loss1Wxhh1

    ∂ loss 1 ∂ W h h = ∂ loss 1 ∂ h 1 ⋅ ∂ h 1 ∂ W h h \frac{\partial \text{loss}_1}{\partial W_{hh}} = \frac{\partial \text{loss}_1}{\partial h_1} \cdot \frac{\partial h_1}{\partial W_{hh}} Whhloss1=h1loss1Whhh1

    参数更新:

    W x h = W x h − η ∂ loss 1 ∂ W x h W_{xh} = W_{xh} - \eta \frac{\partial \text{loss}_1}{\partial W_{xh}} Wxh=WxhηWxhloss1

    W h h = W h h − η ∂ loss 1 ∂ W h h W_{hh} = W_{hh} - \eta \frac{\partial \text{loss}_1}{\partial W_{hh}} Whh=WhhηWhhloss1

    W h y = W h y − η ∂ loss 1 ∂ W h y W_{hy} = W_{hy} - \eta \frac{\partial \text{loss}_1}{\partial W_{hy}} Why=WhyηWhyloss1

    时间步 2(输入 ‘e’):

    使用更新后的 W x h W_{xh} Wxh W h h W_{hh} Whh W h y W_{hy} Why 参数。

    输入向量 x 2 = [ 0 , 1 , 0 , 0 ] x_2 = [0, 1, 0, 0] x2=[0,1,0,0]

    h 2 = tanh ⁡ ( W x h x 2 + W h h h 1 ) = tanh ⁡ ( ( 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ) ( 0 1 0 0 ) + ( 0.1 0.2 0.3 0.4 ) ( 0.0997 0.2913 ) ) h_2 = \tanh(W_{xh} x_2 + W_{hh} h_1) = \tanh \left(

    (0.10.20.30.40.50.60.70.8)" role="presentation">(0.10.20.30.40.50.60.70.8)
    (0100)" role="presentation">(0100)
    +
    (0.10.20.30.4)" role="presentation">(0.10.20.30.4)
    (0.09970.2913)" role="presentation">(0.09970.2913)
    \right) h2=tanh(Wxhx2+Whhh1)=tanh 0.10.30.50.70.20.40.60.8 0100 +(0.10.30.20.4)(0.09970.2913)

    计算后得:

    h 2 = tanh ⁡ ( ( 0.3 0.7 ) + ( 0.1283 0.2147 ) ) = tanh ⁡ ( ( 0.4283 0.9147 ) ) h_2 = \tanh \left(

    (0.30.7)" role="presentation">(0.30.7)
    +
    (0.12830.2147)" role="presentation">(0.12830.2147)
    \right) = \tanh \left(
    (0.42830.9147)" role="presentation">(0.42830.9147)
    \right) h2=tanh((0.30.7)+(0.12830.2147))=tanh((0.42830.9147))

    y 2 = W h y h 2 y_2 = W_{hy} h_2 y2=Whyh2

    预测值 y ^ 2 = softmax ( y 2 ) \hat{y}_2 = \text{softmax}(y_2) y^2=softmax(y2)

    假设真实输出为 ‘l’,对应 one-hot 编码为 y 2 = [ 0 , 0 , 1 , 0 ] y_2 = [0, 0, 1, 0] y2=[0,0,1,0]

    交叉熵损失函数:

    loss 2 = − ∑ i y 2 i log ⁡ ( y ^ 2 i ) \text{loss}_2 = - \sum_{i} y_{2i} \log(\hat{y}_{2i}) loss2=iy2ilog(y^2i)

    梯度计算:

    ∂ loss 2 ∂ W h y = ( y ^ 2 − y 2 ) h 2 T \frac{\partial \text{loss}_2}{\partial W_{hy}} = (\hat{y}_2 - y_2) h_2^T Whyloss2=(y^2y2)h2T

    ∂ loss 2 ∂ W x h = ∂ loss 2 ∂ h 2 ⋅ ∂ h 2 ∂ W x h \frac{\partial \text{loss}_2}{\partial W_{xh}} = \frac{\partial \text{loss}_2}{\partial h_2} \cdot \frac{\partial h_2}{\partial W_{xh}} Wxhloss2=h2loss2Wxhh2

    ∂ loss 2 ∂ W h h = ∂ loss 2 ∂ h 2 ⋅ ∂ h 2 ∂ W h h \frac{\partial \text{loss}_2}{\partial W_{hh}} = \frac{\partial \text{loss}_2}{\partial h_2} \cdot \frac{\partial h_2}{\partial W_{hh}} Whhloss2=h2loss2Whhh2

    参数更新:

    W x h = W x h − η ∂ loss 2 ∂ W x h W_{xh} = W_{xh} - \eta \frac{\partial \text{loss}_2}{\partial W_{xh}} Wxh=WxhηWxhloss2

    W h h = W h h − η ∂ loss 2 ∂ W h h W_{hh} = W_{hh} - \eta \frac{\partial \text{loss}_2}{\partial W_{hh}} Whh=WhhηWhhloss2

    W h y = W h y − η ∂ loss 2 ∂ W h y W_{hy} = W_{hy} - \eta \frac{\partial \text{loss}_2}{\partial W_{hy}} Why=WhyηWhyloss2

    时间步 3(输入 ‘l’):

    使用更新后的 W x h W_{xh} Wxh W h h W_{hh} Whh W h y W_{hy} Why 参数。

    输入向量 x 3 = [ 0 , 0 , 1 , 0 ] x_3 = [0, 0, 1, 0] x3=[0,0,1,0]

    h 3 = tanh ⁡ ( W x h x 3 + W h h h 2 ) h_3 = \tanh(W_{xh} x_3 + W_{hh} h_2) h3=tanh(Wxhx3+Whhh2)

    计算后得:

    h 3 = tanh ⁡ ( ( 0.5 1.2 ) + W h h h 2 ) h_3 = \tanh \left(

    (0.51.2)" role="presentation">(0.51.2)
    + W_{hh} h_2 \right) h3=tanh((0.51.2)+Whhh2)

    y 3 = W h y h 3 y_3 = W_{hy} h_3 y3=Whyh3

    预测值 y ^ 3 = softmax ( y 3 ) \hat{y}_3 = \text{softmax}(y_3) y^3=softmax(y3)

    假设真实输出为 ‘l’,对应 one-hot 编码为 y 3 = [ 0 , 0 , 1 , 0 ] y_3 = [0, 0, 1, 0] y3=[0,0,1,0]

    交叉熵损失函数:

    $$
    \text{loss}3 = - \sum{i} y_{3i} \log(\hat{y}_{3

    i})
    $$

    梯度计算:

    ∂ loss 3 ∂ W h y = ( y ^ 3 − y 3 ) h 3 T \frac{\partial \text{loss}_3}{\partial W_{hy}} = (\hat{y}_3 - y_3) h_3^T Whyloss3=(y^3y3)h3T

    ∂ loss 3 ∂ W x h = ∂ loss 3 ∂ h 3 ⋅ ∂ h 3 ∂ W x h \frac{\partial \text{loss}_3}{\partial W_{xh}} = \frac{\partial \text{loss}_3}{\partial h_3} \cdot \frac{\partial h_3}{\partial W_{xh}} Wxhloss3=h3loss3Wxhh3

    ∂ loss 3 ∂ W h h = ∂ loss 3 ∂ h 3 ⋅ ∂ h 3 ∂ W h h \frac{\partial \text{loss}_3}{\partial W_{hh}} = \frac{\partial \text{loss}_3}{\partial h_3} \cdot \frac{\partial h_3}{\partial W_{hh}} Whhloss3=h3loss3Whhh3

    参数更新:

    W x h = W x h − η ∂ loss 3 ∂ W x h W_{xh} = W_{xh} - \eta \frac{\partial \text{loss}_3}{\partial W_{xh}} Wxh=WxhηWxhloss3

    W h h = W h h − η ∂ loss 3 ∂ W h h W_{hh} = W_{hh} - \eta \frac{\partial \text{loss}_3}{\partial W_{hh}} Whh=WhhηWhhloss3

    W h y = W h y − η ∂ loss 3 ∂ W h y W_{hy} = W_{hy} - \eta \frac{\partial \text{loss}_3}{\partial W_{hy}} Why=WhyηWhyloss3

    时间步 4(输入 ‘l’):

    使用更新后的 W x h W_{xh} Wxh W h h W_{hh} Whh W h y W_{hy} Why 参数。

    输入向量 x 4 = [ 0 , 0 , 1 , 0 ] x_4 = [0, 0, 1, 0] x4=[0,0,1,0]

    h 4 = tanh ⁡ ( W x h x 4 + W h h h 3 ) h_4 = \tanh(W_{xh} x_4 + W_{hh} h_3) h4=tanh(Wxhx4+Whhh3)

    计算后得:

    h 4 = tanh ⁡ ( ( 0.5 1.2 ) + W h h h 3 ) h_4 = \tanh \left(

    (0.51.2)" role="presentation">(0.51.2)
    + W_{hh} h_3 \right) h4=tanh((0.51.2)+Whhh3)

    y 4 = W h y h 4 y_4 = W_{hy} h_4 y4=Whyh4

    预测值 y ^ 4 = softmax ( y 4 ) \hat{y}_4 = \text{softmax}(y_4) y^4=softmax(y4)

    假设真实输出为 ‘o’,对应 one-hot 编码为 y 4 = [ 0 , 0 , 0 , 1 ] y_4 = [0, 0, 0, 1] y4=[0,0,0,1]

    交叉熵损失函数:

    loss 4 = − ∑ i y 4 i log ⁡ ( y ^ 4 i ) \text{loss}_4 = - \sum_{i} y_{4i} \log(\hat{y}_{4i}) loss4=iy4ilog(y^4i)

    梯度计算:

    ∂ loss 4 ∂ W h y = ( y ^ 4 − y 4 ) h 4 T \frac{\partial \text{loss}_4}{\partial W_{hy}} = (\hat{y}_4 - y_4) h_4^T Whyloss4=(y^4y4)h4T

    ∂ loss 4 ∂ W x h = ∂ loss 4 ∂ h 4 ⋅ ∂ h 4 ∂ W x h \frac{\partial \text{loss}_4}{\partial W_{xh}} = \frac{\partial \text{loss}_4}{\partial h_4} \cdot \frac{\partial h_4}{\partial W_{xh}} Wxhloss4=h4loss4Wxhh4

    ∂ loss 4 ∂ W h h = ∂ loss 4 ∂ h 4 ⋅ ∂ h 4 ∂ W h h \frac{\partial \text{loss}_4}{\partial W_{hh}} = \frac{\partial \text{loss}_4}{\partial h_4} \cdot \frac{\partial h_4}{\partial W_{hh}} Whhloss4=h4loss4Whhh4

    参数更新:

    W x h = W x h − η ∂ loss 4 ∂ W x h W_{xh} = W_{xh} - \eta \frac{\partial \text{loss}_4}{\partial W_{xh}} Wxh=WxhηWxhloss4

    W h h = W h h − η ∂ loss 4 ∂ W h h W_{hh} = W_{hh} - \eta \frac{\partial \text{loss}_4}{\partial W_{hh}} Whh=WhhηWhhloss4

    W h y = W h y − η ∂ loss 4 ∂ W h y W_{hy} = W_{hy} - \eta \frac{\partial \text{loss}_4}{\partial W_{hy}} Why=WhyηWhyloss4

    4.代码实现

    下面是一个使用 PyTorch 实现简单 RNN(循环神经网络)的示例代码,该代码将字符序列作为输入并预测下一个字符。我们将使用一个小的字符集进行演示。

    安装 PyTorch

    在开始之前,请确保您已安装 PyTorch。您可以使用以下命令进行安装:

    pip install torch
    
    RNN 实现示例

    我们将实现一个字符级 RNN,用于从序列 “hello” 中预测下一个字符。字符集为 {‘h’, ‘e’, ‘l’, ‘o’}。

    import torch
    import torch.nn as nn
    import torch.optim as optim
    import numpy as np
    
    # 定义字符集和字符到索引的映射
    chars = ['h', 'e', 'l', 'o']
    char_to_idx = {ch: idx for idx, ch in enumerate(chars)}
    idx_to_char = {idx: ch for idx, ch in enumerate(chars)}
    
    # 超参数
    input_size = len(chars)
    hidden_size = 10
    output_size = len(chars)
    num_layers = 1
    learning_rate = 0.01
    num_epochs = 100
    
    # 准备数据
    def char_to_tensor(char):
        tensor = torch.zeros(input_size)
        tensor[char_to_idx[char]] = 1.0
        return tensor
    
    def string_to_tensor(string):
        tensor = torch.zeros(len(string), input_size)
        for idx, char in enumerate(string):
            tensor[idx][char_to_idx[char]] = 1.0
        return tensor
    
    input_seq = "hell"
    target_seq = "ello"
    
    input_tensor = string_to_tensor(input_seq)
    target_tensor = torch.tensor([char_to_idx[ch] for ch in target_seq])
    
    # 定义 RNN 模型
    class RNN(nn.Module):
        def __init__(self, input_size, hidden_size, output_size):
            super(RNN, self).__init__()
            self.hidden_size = hidden_size
            self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
            self.fc = nn.Linear(hidden_size, output_size)
    
        def forward(self, x, hidden):
            out, hidden = self.rnn(x, hidden)
            out = self.fc(out[:, -1, :])
            return out, hidden
    
        def init_hidden(self):
            return torch.zeros(num_layers, 1, hidden_size)
    
    # 初始化模型、损失函数和优化器
    model = RNN(input_size, hidden_size, output_size)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    # 训练模型
    for epoch in range(num_epochs):
        hidden = model.init_hidden()
        model.zero_grad()
        
        input_seq = input_tensor.unsqueeze(0)
        output, hidden = model(input_seq, hidden)
        
        loss = criterion(output, target_tensor.unsqueeze(0))
        loss.backward()
        optimizer.step()
        
        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')
    
    # 测试模型
    def predict(model, char, hidden=None):
        if hidden is None:
            hidden = model.init_hidden()
        input_tensor = char_to_tensor(char).unsqueeze(0).unsqueeze(0)
        output, hidden = model(input_tensor, hidden)
        _, predicted_idx = torch.max(output, 1)
        return idx_to_char[predicted_idx.item()], hidden
    
    hidden = model.init_hidden()
    input_char = 'h'
    predicted_seq = input_char
    for _ in range(len(input_seq)):
        next_char, hidden = predict(model, input_char, hidden)
        predicted_seq += next_char
        input_char = next_char
    
    print(f'Predicted sequence: {predicted_seq}')
    
    代码说明
    1. 数据准备

      • 我们定义了一个简单的字符集 {‘h’, ‘e’, ‘l’, ‘o’},并创建了字符到索引和索引到字符的映射。
      • char_to_tensor 函数将字符转换为 one-hot 向量。
      • string_to_tensor 函数将字符串转换为一系列 one-hot 向量。
    2. 定义 RNN 模型

      • RNN 类继承自 nn.Module,包含一个 RNN 层和一个全连接层。
      • forward 方法执行前向传播。
      • init_hidden 方法初始化隐藏状态。
    3. 训练模型

      • 我们使用交叉熵损失函数和 Adam 优化器。
      • 在每个训练周期,我们进行前向传播、计算损失、反向传播和参数更新。
    4. 测试模型

      • predict 函数根据给定的输入字符生成下一个字符。
      • 我们使用训练好的模型从字符 ‘h’ 开始生成一个字符序列。

    运行该代码后,您将看到模型预测的字符序列,它会逐渐学会从输入序列中预测下一个字符。

    更多问题咨询

    CosAI

  • 相关阅读:
    Codeforces Round #832 (Div. 2)
    Peoeasy机器人:原点无法重置问题
    idea放大镜效果当对源码或者平常自己的代码关系
    导致静脉炎的因素有哪些呢?
    CSS 通过伪类 nth-child 和 nth-of-type 实现奇偶选择器的区别
    芯邦'CBM2099E
    【机器学习300问】37、什么是迁移学习?
    【云原生 | Kubernetes 系列】---Prometheus 监控Java服务
    【云原生】云原生后端:案例研究与最佳实践
    介绍并改造一个作用于Anki笔记浏览器的插件
  • 原文地址:https://blog.csdn.net/weixin_47552266/article/details/139659451