• 第86步 时间序列建模实战:Transformer回归建模


    基于WIN10的64位系统演示

    一、写在前面

    这一期,我们介绍Transformer回归。

    同样,这里使用这个数据:

    《PLoS One》2015年一篇题目为《Comparison of Two Hybrid Models for Forecasting the Incidence of Hemorrhagic Fever with Renal Syndrome in Jiangsu Province, China》文章的公开数据做演示。数据为江苏省2004年1月至2012年12月肾综合症出血热月发病率。运用2004年1月至2011年12月的数据预测2012年12个月的发病率数据。

    二、Transformer回归

    (1)原理

    Transformer框架原本是为NLP任务,特别是机器翻译而设计的。但由于其独特的自注意力机制,Transformer在处理顺序数据时表现出色,因此被广泛应用于各种序列数据任务,包括回归任务。

    (a)回归任务中的Transformer:

    (a1)在回归任务中,Transformer可以捕捉数据中的长期依赖关系。例如,在时间序列数据中,Transformer可以捕捉时间点之间的关系,即使这些时间点相隔很远。

    (a2)为回归任务使用Transformer时,通常需要稍微调整模型结构,特别是模型的输出部分。原始的Transformer用于生成序列,但在回归任务中,我们通常需要一个单一的实数作为输出。

    (b)Transformer的优点:

    (b1)自注意力机制:可以捕捉序列中的任意位置间的依赖关系,而不像RNN那样依赖于前面的信息。

    (b2)并行计算:与RNN或LSTM不同,Transformer不需要按顺序处理数据,因此更容易并行处理,提高训练速度。

    (b3)可扩展性:可以通过堆叠多个Transformer层来捕捉复杂的模式和关系。

    模型解释性:由于自注意力机制,我们可以可视化哪些输入位置对于特定输出最为重要,这增加了模型的解释性。

    (c)Transformer的缺点:

    (c1)计算需求:尽管可以并行化,但Transformer模型,特别是大型模型,仍然需要大量的计算资源。

    (c2)过拟合:在小型数据集上,特别是没有足够的正则化时,Transformer可能会过拟合。

    (c3)长序列的挑战:尽管Transformer可以处理长序列,但由于自注意力机制的复杂性,处理非常长的序列仍然是一个挑战。为此,研究人员已经提出了许多变种,例如Reformer。

    总体而言,Transformer提供了一个强大的框架来处理各种序列数据任务。

    (2)单步滚动预测

    1. import pandas as pd
    2. import numpy as np
    3. from sklearn.metrics import mean_absolute_error, mean_squared_error
    4. from tensorflow.python.keras.models import Sequential
    5. from tensorflow.python.keras import layers, models, optimizers
    6. from tensorflow.python.keras.optimizers import adam_v2
    7. # 读取数据
    8. data = pd.read_csv('data.csv')
    9. # 将时间列转换为日期格式
    10. data['time'] = pd.to_datetime(data['time'], format='%b-%y')
    11. # 创建滞后期特征
    12. lag_period = 6
    13. for i in range(lag_period, 0, -1):
    14. data[f'lag_{i}'] = data['incidence'].shift(lag_period - i + 1)
    15. # 删除包含 NaN 的行
    16. data = data.dropna().reset_index(drop=True)
    17. # 划分训练集和验证集
    18. train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
    19. validation_data = data[(data['time'] >= '2012-01-01') & (data['time'] <= '2012-12-31')]
    20. # 定义特征和目标变量
    21. X_train = train_data[['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5', 'lag_6']].values
    22. y_train = train_data['incidence'].values
    23. X_validation = validation_data[['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5', 'lag_6']].values
    24. y_validation = validation_data['incidence'].values
    25. # 对于Transformer,我们需要将输入数据重塑为 [samples, timesteps, features]
    26. X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
    27. X_validation = X_validation.reshape(X_validation.shape[0], X_validation.shape[1], 1)
    28. # Transformer的一些参数设置
    29. d_model = 128
    30. num_heads = 4
    31. # 构建Transformer回归模型
    32. input_layer = layers.Input(shape=(X_train.shape[1], 1))
    33. # Linear Embedding
    34. x = layers.Dense(d_model)(input_layer)
    35. # Multi Head Self Attention
    36. x = layers.MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
    37. # Feed Forward Neural Networks
    38. x = layers.GlobalAveragePooling1D()(x)
    39. x = layers.Dropout(0.1)(x)
    40. x = layers.Dense(50, activation='relu')(x)
    41. x = layers.Dropout(0.1)(x)
    42. output_layer = layers.Dense(1)(x)
    43. model = models.Model(inputs=input_layer, outputs=output_layer)
    44. model.compile(optimizer=adam_v2.Adam(learning_rate=0.001), loss='mse')
    45. # 训练模型
    46. history = model.fit(X_train, y_train, epochs=200, batch_size=32, validation_data=(X_validation, y_validation), verbose=0)
    47. # 单步滚动预测函数
    48. def rolling_forecast(model, initial_features, n_forecasts):
    49. forecasts = []
    50. current_features = initial_features.copy()
    51. for i in range(n_forecasts):
    52. # 使用当前的特征进行预测
    53. forecast = model.predict(current_features.reshape(1, len(current_features), 1)).flatten()[0]
    54. forecasts.append(forecast)
    55. # 更新特征,用新的预测值替换最旧的特征
    56. current_features = np.roll(current_features, shift=-1)
    57. current_features[-1] = forecast
    58. return np.array(forecasts)
    59. # 使用训练集的最后6个数据点作为初始特征
    60. initial_features = X_train[-1].flatten()
    61. # 使用单步滚动预测方法预测验证集
    62. y_validation_pred = rolling_forecast(model, initial_features, len(X_validation))
    63. # 计算训练集上的MAE, MAPE, MSE 和 RMSE
    64. mae_train = mean_absolute_error(y_train, model.predict(X_train).flatten())
    65. mape_train = np.mean(np.abs((y_train - model.predict(X_train).flatten()) / y_train))
    66. mse_train = mean_squared_error(y_train, model.predict(X_train).flatten())
    67. rmse_train = np.sqrt(mse_train)
    68. # 计算验证集上的MAE, MAPE, MSE 和 RMSE
    69. mae_validation = mean_absolute_error(y_validation, y_validation_pred)
    70. mape_validation = np.mean(np.abs((y_validation - y_validation_pred) / y_validation))
    71. mse_validation = mean_squared_error(y_validation, y_validation_pred)
    72. rmse_validation = np.sqrt(mse_validation)
    73. print("验证集:", mae_validation, mape_validation, mse_validation, rmse_validation)
    74. print("训练集:", mae_train, mape_train, mse_train, rmse_train)

    看结果:

    (3)多步滚动预测-vol. 1

    1. import pandas as pd
    2. import numpy as np
    3. from sklearn.metrics import mean_absolute_error, mean_squared_error
    4. import tensorflow as tf
    5. from tensorflow.python.keras.models import Model
    6. from tensorflow.python.keras.layers import Input, MultiHeadAttention, Dense, Dropout, LayerNormalization, Flatten
    7. from tensorflow.python.keras.optimizers import adam_v2
    8. # 读取数据
    9. data = pd.read_csv('data.csv')
    10. data['time'] = pd.to_datetime(data['time'], format='%b-%y')
    11. n = 6
    12. m = 2
    13. # 创建滞后期特征
    14. for i in range(n, 0, -1):
    15. data[f'lag_{i}'] = data['incidence'].shift(n - i + 1)
    16. data = data.dropna().reset_index(drop=True)
    17. train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
    18. validation_data = data[(data['time'] >= '2012-01-01') & (data['time'] <= '2012-12-31')]
    19. # 准备训练数据
    20. X_train = []
    21. y_train = []
    22. for i in range(len(train_data) - n - m + 1):
    23. X_train.append(train_data.iloc[i+n-1][[f'lag_{j}' for j in range(1, n+1)]].values)
    24. y_train.append(train_data.iloc[i+n:i+n+m]['incidence'].values)
    25. X_train = np.array(X_train)
    26. y_train = np.array(y_train)
    27. X_train = X_train.astype(np.float32)
    28. y_train = y_train.astype(np.float32)
    29. # 构建Transformer模型
    30. inputs = Input(shape=(n, 1))
    31. x = MultiHeadAttention(num_heads=8, key_dim=64)(inputs, inputs)
    32. x = Dropout(0.1)(x)
    33. x = LayerNormalization(epsilon=1e-6)(x + inputs)
    34. x = Flatten()(x) # 新增的Flatten层
    35. x = Dense(50, activation='relu')(x)
    36. x = Dropout(0.1)(x)
    37. outputs = Dense(m)(x)
    38. model = Model(inputs=inputs, outputs=outputs)
    39. model.compile(optimizer=adam_v2.Adam(learning_rate=0.001), loss='mse')
    40. # 训练模型
    41. model.fit(X_train, y_train, epochs=200, batch_size=32, verbose=0)
    42. def transformer_rolling_forecast(data, model, n, m):
    43. y_pred = []
    44. for i in range(len(data) - n):
    45. input_data = data.iloc[i+n-1][[f'lag_{j}' for j in range(1, n+1)]].values.astype(np.float32).reshape(1, n, 1)
    46. pred = model.predict(input_data)
    47. y_pred.extend(pred[0])
    48. for i in range(1, m):
    49. for j in range(len(y_pred) - i):
    50. y_pred[j+i] = (y_pred[j+i] + y_pred[j]) / 2
    51. return np.array(y_pred)
    52. # Predict for train_data and validation_data
    53. y_train_pred_transformer = transformer_rolling_forecast(train_data, model, n, m)[:len(y_train)]
    54. y_validation_pred_transformer = transformer_rolling_forecast(validation_data, model, n, m)[:len(validation_data) - n]
    55. # Calculate performance metrics for train_data
    56. mae_train = mean_absolute_error(train_data['incidence'].values[n:len(y_train_pred_transformer)+n], y_train_pred_transformer)
    57. mape_train = np.mean(np.abs((train_data['incidence'].values[n:len(y_train_pred_transformer)+n] - y_train_pred_transformer) / train_data['incidence'].values[n:len(y_train_pred_transformer)+n]))
    58. mse_train = mean_squared_error(train_data['incidence'].values[n:len(y_train_pred_transformer)+n], y_train_pred_transformer)
    59. rmse_train = np.sqrt(mse_train)
    60. # Calculate performance metrics for validation_data
    61. mae_validation = mean_absolute_error(validation_data['incidence'].values[n:len(y_validation_pred_transformer)+n], y_validation_pred_transformer)
    62. mape_validation = np.mean(np.abs((validation_data['incidence'].values[n:len(y_validation_pred_transformer)+n] - y_validation_pred_transformer) / validation_data['incidence'].values[n:len(y_validation_pred_transformer)+n]))
    63. mse_validation = mean_squared_error(validation_data['incidence'].values[n:len(y_validation_pred_transformer)+n], y_validation_pred_transformer)
    64. rmse_validation = np.sqrt(mse_validation)
    65. print("训练集:", mae_train, mape_train, mse_train, rmse_train)
    66. print("验证集:", mae_validation, mape_validation, mse_validation, rmse_validation)

    结果:

    (4)多步滚动预测-vol. 2

    1. import pandas as pd
    2. import numpy as np
    3. from sklearn.model_selection import train_test_split
    4. from sklearn.metrics import mean_absolute_error, mean_squared_error
    5. from tensorflow.python.keras.models import Sequential, Model
    6. from tensorflow.python.keras.layers import Dense, Conv1D, Flatten, MaxPooling1D, Input, MultiHeadAttention, LayerNormalization, Dropout
    7. from tensorflow.python.keras.optimizers import adam_v2
    8. # Loading and preprocessing the data
    9. data = pd.read_csv('data.csv')
    10. data['time'] = pd.to_datetime(data['time'], format='%b-%y')
    11. n = 6
    12. m = 2
    13. # 创建滞后期特征
    14. for i in range(n, 0, -1):
    15. data[f'lag_{i}'] = data['incidence'].shift(n - i + 1)
    16. data = data.dropna().reset_index(drop=True)
    17. train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
    18. validation_data = data[(data['time'] >= '2012-01-01') & (data['time'] <= '2012-12-31')]
    19. # 只对X_train、y_train、X_validation取奇数行
    20. X_train = train_data[[f'lag_{i}' for i in range(1, n+1)]].iloc[::2].reset_index(drop=True).values
    21. X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
    22. y_train_list = [train_data['incidence'].shift(-i) for i in range(m)]
    23. y_train = pd.concat(y_train_list, axis=1)
    24. y_train.columns = [f'target_{i+1}' for i in range(m)]
    25. y_train = y_train.iloc[::2].reset_index(drop=True).dropna().values[:, 0]
    26. X_validation = validation_data[[f'lag_{i}' for i in range(1, n+1)]].iloc[::2].reset_index(drop=True).values
    27. X_validation = X_validation.reshape(X_validation.shape[0], X_validation.shape[1], 1)
    28. y_validation = validation_data['incidence'].values
    29. # Building the Transformer model
    30. inputs = Input(shape=(n, 1))
    31. x = MultiHeadAttention(num_heads=8, key_dim=64)(inputs, inputs)
    32. x = Dropout(0.1)(x)
    33. x = LayerNormalization(epsilon=1e-6)(x + inputs)
    34. x = Flatten()(x)
    35. x = Dense(50, activation='relu')(x)
    36. outputs = Dense(1)(x)
    37. model = Model(inputs=inputs, outputs=outputs)
    38. optimizer = adam_v2.Adam(learning_rate=0.001)
    39. model.compile(optimizer=optimizer, loss='mse')
    40. # Train the model
    41. model.fit(X_train, y_train, epochs=200, batch_size=32, verbose=0)
    42. # Predict on validation set
    43. y_validation_pred = model.predict(X_validation).flatten()
    44. # Compute metrics for validation set
    45. mae_validation = mean_absolute_error(y_validation[:len(y_validation_pred)], y_validation_pred)
    46. mape_validation = np.mean(np.abs((y_validation[:len(y_validation_pred)] - y_validation_pred) / y_validation[:len(y_validation_pred)]))
    47. mse_validation = mean_squared_error(y_validation[:len(y_validation_pred)], y_validation_pred)
    48. rmse_validation = np.sqrt(mse_validation)
    49. # Predict on training set
    50. y_train_pred = model.predict(X_train).flatten()
    51. # Compute metrics for training set
    52. mae_train = mean_absolute_error(y_train, y_train_pred)
    53. mape_train = np.mean(np.abs((y_train - y_train_pred) / y_train))
    54. mse_train = mean_squared_error(y_train, y_train_pred)
    55. rmse_train = np.sqrt(mse_train)
    56. print("验证集:", mae_validation, mape_validation, mse_validation, rmse_validation)
    57. print("训练集:", mae_train, mape_train, mse_train, rmse_train)

    结果:

    (5)多步滚动预测-vol. 3

    1. import pandas as pd
    2. import numpy as np
    3. from sklearn.metrics import mean_absolute_error, mean_squared_error
    4. from tensorflow.python.keras.models import Sequential, Model
    5. from tensorflow.python.keras.layers import Dense, Flatten, Input, MultiHeadAttention, LayerNormalization, Dropout
    6. from tensorflow.python.keras.optimizers import adam_v2
    7. # 数据读取和预处理
    8. data = pd.read_csv('data.csv')
    9. data_y = pd.read_csv('data.csv')
    10. data['time'] = pd.to_datetime(data['time'], format='%b-%y')
    11. data_y['time'] = pd.to_datetime(data_y['time'], format='%b-%y')
    12. n = 6
    13. for i in range(n, 0, -1):
    14. data[f'lag_{i}'] = data['incidence'].shift(n - i + 1)
    15. data = data.dropna().reset_index(drop=True)
    16. train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
    17. X_train = train_data[[f'lag_{i}' for i in range(1, n+1)]]
    18. m = 3
    19. X_train_list = []
    20. y_train_list = []
    21. for i in range(m):
    22. X_temp = X_train
    23. y_temp = data_y['incidence'].iloc[n + i:len(data_y) - m + 1 + i]
    24. X_train_list.append(X_temp)
    25. y_train_list.append(y_temp)
    26. for i in range(m):
    27. X_train_list[i] = X_train_list[i].iloc[:-(m-1)].values
    28. X_train_list[i] = X_train_list[i].reshape(X_train_list[i].shape[0], X_train_list[i].shape[1], 1)
    29. y_train_list[i] = y_train_list[i].iloc[:len(X_train_list[i])].values
    30. # 模型训练
    31. models = []
    32. for i in range(m):
    33. # Building the Transformer model
    34. inputs = Input(shape=(n, 1))
    35. x = MultiHeadAttention(num_heads=8, key_dim=64)(inputs, inputs)
    36. x = Dropout(0.1)(x)
    37. x = LayerNormalization(epsilon=1e-6)(x + inputs)
    38. x = Flatten()(x)
    39. x = Dense(50, activation='relu')(x)
    40. outputs = Dense(1)(x)
    41. model = Model(inputs=inputs, outputs=outputs)
    42. optimizer = adam_v2.Adam(learning_rate=0.001)
    43. model.compile(optimizer=optimizer, loss='mse')
    44. model.fit(X_train_list[i], y_train_list[i], epochs=200, batch_size=32, verbose=0)
    45. models.append(model)
    46. validation_start_time = train_data['time'].iloc[-1] + pd.DateOffset(months=1)
    47. validation_data = data[data['time'] >= validation_start_time]
    48. X_validation = validation_data[[f'lag_{i}' for i in range(1, n+1)]].values
    49. X_validation = X_validation.reshape(X_validation.shape[0], X_validation.shape[1], 1)
    50. y_validation_pred_list = [model.predict(X_validation) for model in models]
    51. y_train_pred_list = [model.predict(X_train_list[i]) for i, model in enumerate(models)]
    52. def concatenate_predictions(pred_list):
    53. concatenated = []
    54. for j in range(len(pred_list[0])):
    55. for i in range(m):
    56. concatenated.append(pred_list[i][j])
    57. return concatenated
    58. y_validation_pred = np.array(concatenate_predictions(y_validation_pred_list))[:len(validation_data['incidence'])]
    59. y_train_pred = np.array(concatenate_predictions(y_train_pred_list))[:len(train_data['incidence']) - m + 1]
    60. y_validation_pred = y_validation_pred.flatten()
    61. y_train_pred = y_train_pred.flatten()
    62. mae_validation = mean_absolute_error(validation_data['incidence'], y_validation_pred)
    63. mape_validation = np.mean(np.abs((validation_data['incidence'] - y_validation_pred) / validation_data['incidence']))
    64. mse_validation = mean_squared_error(validation_data['incidence'], y_validation_pred)
    65. rmse_validation = np.sqrt(mse_validation)
    66. mae_train = mean_absolute_error(train_data['incidence'][:-(m-1)], y_train_pred)
    67. mape_train = np.mean(np.abs((train_data['incidence'][:-(m-1)] - y_train_pred) / train_data['incidence'][:-(m-1)]))
    68. mse_train = mean_squared_error(train_data['incidence'][:-(m-1)], y_train_pred)
    69. rmse_train = np.sqrt(mse_train)
    70. print("验证集:", mae_validation, mape_validation, mse_validation, rmse_validation)
    71. print("训练集:", mae_train, mape_train, mse_train, rmse_train)

    结果:

    三、数据

    链接:https://pan.baidu.com/s/1EFaWfHoG14h15KCEhn1STg?pwd=q41n

    提取码:q41n

  • 相关阅读:
    【C++】构造函数意义 ( 构造函数显式调用与隐式调用 | 构造函数替代方案 - 初始化函数 | 初始化函数缺陷 | 默认构造函数 )
    Github 2024-02-20开源项目日报 Top10
    基于Java毕业设计影院网上售票系统演示录像源码+系统+mysql+lw文档+部署软件
    耳机类型分类
    【定时开关机】windows 10 如何设置定时开关机
    请求传参.
    组件库自定义主题换肤实现方案
    27、Flink 的SQL之SELECT (SQL Hints 和 Joins)介绍及详细示例(2-1)
    GD32F103x 定时器
    实例方法(instance method)、类方法、构造方法(三)
  • 原文地址:https://blog.csdn.net/qq_30452897/article/details/133636381