• 循环神经网络(RNN)实现股票预测


    一、前言

    我的环境:

    • 语言环境:Python3.6.5
    • 编译器:jupyter notebook
    • 深度学习环境:TensorFlow2.4.1

    往期精彩内容:

    来自专栏:机器学习与深度学习算法推荐

    二、前期工作

    1. 设置GPU(如果使用的是CPU可以忽略这步)

    import tensorflow as tf
    
    gpus = tf.config.list_physical_devices("GPU")
    
    if gpus:
        tf.config.experimental.set_memory_growth(gpus[0], True)  #设置GPU显存用量按需使用
        tf.config.set_visible_devices([gpus[0]],"GPU")
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    2. 导入数据

    import os,math
    from tensorflow.keras.layers import Dropout, Dense, SimpleRNN
    from sklearn.preprocessing   import MinMaxScaler
    from sklearn                 import metrics
    import numpy             as np
    import pandas            as pd
    import tensorflow        as tf
    import matplotlib.pyplot as plt
    # 支持中文
    plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
    plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    data = pd.read_csv('./datasets/SH600519.csv')  # 读取股票文件
    
    data
    
    • 1
    • 2
    • 3
    Unnamed: 0dateopenclosehighlowvolumecode
    0742010-04-2688.70287.38189.07287.362107036.13600519
    1752010-04-2787.35584.84187.35584.68158234.48600519
    2762010-04-2884.23584.31885.12883.59726287.43600519
    3772010-04-2984.59285.67186.31584.59234501.20600519
    4782010-04-3083.87182.34083.87181.52385566.70600519
    242124952020-04-201221.0001227.3001231.5001216.80024239.00600519
    242224962020-04-211221.0201200.0001223.9901193.00029224.00600519
    242324972020-04-221206.0001244.5001249.5001202.22044035.00600519
    242424982020-04-231250.0001252.2601265.6801247.77026899.00600519
    242524992020-04-241248.0001250.5601259.8901235.18019122.00600519

    2426 rows × 8 columns

    training_set = data.iloc[0:2426 - 300, 2:3].values  
    test_set = data.iloc[2426 - 300:, 2:3].values  
    
    • 1
    • 2

    四、数据预处理

    1.归一化

    sc           = MinMaxScaler(feature_range=(0, 1))
    training_set = sc.fit_transform(training_set)
    test_set     = sc.transform(test_set) 
    
    • 1
    • 2
    • 3

    2.设置测试集训练集

    x_train = []
    y_train = []
    
    x_test = []
    y_test = []
    
    """
    使用前60天的开盘价作为输入特征x_train
        第61天的开盘价作为输入标签y_train
        
    for循环共构建2426-300-60=2066组训练数据。
           共构建300-60=260组测试数据
    """
    for i in range(60, len(training_set)):
        x_train.append(training_set[i - 60:i, 0])
        y_train.append(training_set[i, 0])
        
    for i in range(60, len(test_set)):
        x_test.append(test_set[i - 60:i, 0])
        y_test.append(test_set[i, 0])
        
    # 对训练集进行打乱
    np.random.seed(7)
    np.random.shuffle(x_train)
    np.random.seed(7)
    np.random.shuffle(y_train)
    tf.random.set_seed(7)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    """
    将训练数据调整为数组(array)
    
    调整后的形状:
    x_train:(2066, 60, 1)
    y_train:(2066,)
    x_test :(240, 60, 1)
    y_test :(240,)
    """
    x_train, y_train = np.array(x_train), np.array(y_train) # x_train形状为:(2066, 60, 1)
    x_test,  y_test  = np.array(x_test),  np.array(y_test)
    
    """
    输入要求:[送入样本数, 循环核时间展开步数, 每个时间步输入特征个数]
    """
    x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))
    x_test  = np.reshape(x_test,  (x_test.shape[0], 60, 1))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17

    五、构建模型

    model = tf.keras.Sequential([
        SimpleRNN(80, return_sequences=True), #布尔值。是返回输出序列中的最后一个输出,还是全部序列。
        Dropout(0.2),                         #防止过拟合
        SimpleRNN(80),
        Dropout(0.2),
        Dense(1)
    ])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    六、激活模型

    # 该应用只观测loss数值,不观测准确率,所以删去metrics选项,一会在每个epoch迭代显示时只显示loss值
    model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
                  loss='mean_squared_error')  # 损失函数用均方误差
    
    • 1
    • 2
    • 3

    七、训练模型

    history = model.fit(x_train, y_train, 
                        batch_size=64, 
                        epochs=20, 
                        validation_data=(x_test, y_test), 
                        validation_freq=1)                  #测试的epoch间隔数
    
    model.summary()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    Epoch 1/20
    33/33 [==============================] - 6s 123ms/step - loss: 0.1809 - val_loss: 0.0310
    Epoch 2/20
    33/33 [==============================] - 3s 105ms/step - loss: 0.0257 - val_loss: 0.0721
    Epoch 3/20
    33/33 [==============================] - 3s 85ms/step - loss: 0.0165 - val_loss: 0.0059
    Epoch 4/20
    33/33 [==============================] - 3s 85ms/step - loss: 0.0097 - val_loss: 0.0111
    Epoch 5/20
    33/33 [==============================] - 3s 90ms/step - loss: 0.0099 - val_loss: 0.0139
    Epoch 6/20
    33/33 [==============================] - 3s 105ms/step - loss: 0.0067 - val_loss: 0.0167
    Epoch 7/20
    33/33 [==============================] - 3s 86ms/step - loss: 0.0067 - val_loss: 0.0095
    Epoch 8/20
    33/33 [==============================] - 3s 91ms/step - loss: 0.0063 - val_loss: 0.0218
    Epoch 9/20
    33/33 [==============================] - 3s 99ms/step - loss: 0.0052 - val_loss: 0.0109
    Epoch 10/20
    33/33 [==============================] - 3s 99ms/step - loss: 0.0043 - val_loss: 0.0120
    Epoch 11/20
    33/33 [==============================] - 3s 92ms/step - loss: 0.0044 - val_loss: 0.0167
    Epoch 12/20
    33/33 [==============================] - 3s 89ms/step - loss: 0.0039 - val_loss: 0.0032
    Epoch 13/20
    33/33 [==============================] - 3s 88ms/step - loss: 0.0041 - val_loss: 0.0052
    Epoch 14/20
    33/33 [==============================] - 3s 93ms/step - loss: 0.0035 - val_loss: 0.0179
    Epoch 15/20
    33/33 [==============================] - 4s 110ms/step - loss: 0.0033 - val_loss: 0.0124
    Epoch 16/20
    33/33 [==============================] - 3s 95ms/step - loss: 0.0035 - val_loss: 0.0149
    Epoch 17/20
    33/33 [==============================] - 4s 111ms/step - loss: 0.0028 - val_loss: 0.0111
    Epoch 18/20
    33/33 [==============================] - 4s 110ms/step - loss: 0.0029 - val_loss: 0.0061
    Epoch 19/20
    33/33 [==============================] - 3s 104ms/step - loss: 0.0027 - val_loss: 0.0110
    Epoch 20/20
    33/33 [==============================] - 3s 90ms/step - loss: 0.0028 - val_loss: 0.0037
    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    simple_rnn (SimpleRNN)       (None, 60, 80)            6560      
    _________________________________________________________________
    dropout (Dropout)            (None, 60, 80)            0         
    _________________________________________________________________
    simple_rnn_1 (SimpleRNN)     (None, 80)                12880     
    _________________________________________________________________
    dropout_1 (Dropout)          (None, 80)                0         
    _________________________________________________________________
    dense (Dense)                (None, 1)                 81        
    =================================================================
    Total params: 19,521
    Trainable params: 19,521
    Non-trainable params: 0
    _________________________________________________________________
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58

    八、结果可视化

    1.绘制loss图

    plt.plot(history.history['loss']    , label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.legend()
    plt.show()
    
    • 1
    • 2
    • 3
    • 4

    2.预测

    predicted_stock_price = model.predict(x_test)                       # 测试集输入模型进行预测
    predicted_stock_price = sc.inverse_transform(predicted_stock_price) # 对预测数据还原---从(0,1)反归一化到原始范围
    real_stock_price = sc.inverse_transform(test_set[60:])              # 对真实数据还原---从(0,1)反归一化到原始范围
    
    # 画出真实数据和预测数据的对比曲线
    plt.plot(real_stock_price, color='red', label='Stock Price')
    plt.plot(predicted_stock_price, color='blue', label='Predicted Stock Price')
    plt.title('Stock Price Prediction by K同学啊')
    plt.xlabel('Time')
    plt.ylabel('Stock Price')
    plt.legend()
    plt.show()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    在这里插入图片描述

    3.评估

    MSE   = metrics.mean_squared_error(predicted_stock_price, real_stock_price)
    RMSE  = metrics.mean_squared_error(predicted_stock_price, real_stock_price)**0.5
    MAE   = metrics.mean_absolute_error(predicted_stock_price, real_stock_price)
    R2    = metrics.r2_score(predicted_stock_price, real_stock_price)
    
    print('均方误差: %.5f' % MSE)
    print('均方根误差: %.5f' % RMSE)
    print('平均绝对误差: %.5f' % MAE)
    print('R2: %.5f' % R2)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    均方误差: 1833.92534
    均方根误差: 42.82435
    平均绝对误差: 36.23424
    R2: 0.72347
    
    • 1
    • 2
    • 3
    • 4
  • 相关阅读:
    Go 的三种指针
    【PTA-训练day3】L2-014 列车调度 + L1-009 N个数求和
    使用.NET源生成器(SG)实现一个自动注入的生成器
    独立站的五个免费流量获取方式
    34汽车租聘系统Javaweb ssm
    Web协议:HTTP协议
    【预测模型-SVM预测】基于粒子群算法结合支持向量机SVM实现Covid-19风险预测附matlab代码
    LAMP集群分布式安全方案
    51单片机学习:LED点阵实验(点亮一个点)
    操作系统学习笔记3-同步互斥问题
  • 原文地址:https://blog.csdn.net/weixin_45822638/article/details/134545938