简单回归之电表预测

文章目录

前言
一、场景描述
二、落地实践
三、完整代码
总结

前言

以下内容是在学习过程中的一些笔记，难免会有错误和纰漏的地方。如果造成任何困扰，很抱歉。

一、场景描述

描述：这里可以添加本文要记录的大概内容

通过获取当月每天的电表能耗数据，以此来推算未来的每一天的电表数据情况，首先看看实际效果图

这里面包含了三条数据线

训练数据 - 蓝色
测试核准数据 - 橙色
预测值 - 绿色

这里面的预测数据还是有些虚高，主要还是因为维度过低以及数据量过少的缘故，但是从轨迹上看大致还行，在训练数据集越庞大的情况下预测的数据值会更加的准确。

这里面借鉴网友的代码采取了两种方案：

自回归移动平均预测模型
季节性预测模型

但是数据维度太低了，目前的自变量及因变量只有时间和能耗数据值，所以最终还是选择了自回归移动平均模型。

二、落地实践

描述：这里可以添加本文要记录的大概内容

首先我们将步骤分离为如下几个部分

相关库引入

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
1
2
3

数据集引入

# 数据集导入
df = pd.read_csv('能耗数据01.csv', nrows=33)
1
2

数据集分类，训练集 or 测试集 - 训练集是给机器学习用，测试集是为了看预测值准不准
```
# 训练集 / 测试集
train = df[0:20]
test = df[20:]
1
2
3
```

数据格式处理

# 时间戳格式处理
df['Timestamp'] = pd.to_datetime(df['Datetime'], format='%Y/%m/%d')
df.index = df['Timestamp']
df = df.resample('D').mean()

train['Timestamp'] = pd.to_datetime(train['Datetime'], format='%Y/%m/%d')
train.index = train['Timestamp']
train = train.resample('D').mean()

test['Timestamp'] = pd.to_datetime(test['Datetime'], format='%Y/%m/%d')
test.index = test['Timestamp']
test = test.resample('D').mean()
1
2
3
4
5
6
7
8
9
10
11
12

预测，查看结果

# 坐标轴刻入
train.Count.plot(figsize=(15, 8), title='Daily Train', fontsize=14)
test.Count.plot(figsize=(15, 8), title='Daily Test', fontsize=14)
# plt.show()

# 自回归移动平均 预测模型
y_hat_avg = test.copy()
fit1 = sm.tsa.statespace.SARIMAX(train.Count, order=(2, 1, 4), seasonal_order=(0, 1, 1, 7)).fit()
y_hat_avg['SARIMA'] = fit1.predict(start="2022/7/31", end="2022/8/10", dynamic=True)
plt.figure(figsize=(16, 8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['SARIMA'], label='xue xi hou')
plt.legend(loc='best')
plt.show()

# 季节性 预测模型
# y_hat_avg = test.copy()
# fit1 = ExponentialSmoothing(np.asarray(train['Count']), seasonal_periods=7, trend='add', seasonal='add', ).fit()
# y_hat_avg['Holt_Winter'] = fit1.forecast(len(test))
# plt.figure(figsize=(16, 8))
# plt.plot(train['Count'], label='Train')
# plt.plot(test['Count'], label='Test')
# plt.plot(y_hat_avg['Holt_Winter'], label='xue xi hou')
# plt.legend(loc='best')
# plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

图解这个流程大概就是这样

在数据维度增多的情况下（标签分类也完整的情况下），数据必然是会越来越准确，但是也需要一个量的积累，上面的示例仅仅只是个一元回归，多元回归的预测必然复杂，但是也复合业务情况，所以以后会补充一些通用完善的业务代码与机器学习相结合，让大家都可以直接套用。

三、完整代码

描述：这里可以添加本文要记录的大概内容

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# 数据集导入
df = pd.read_csv('能耗数据01.csv', nrows=33)

# 训练集 / 测试集
train = df[0:20]
test = df[20:]

# 时间戳格式处理
df['Timestamp'] = pd.to_datetime(df['Datetime'], format='%Y/%m/%d')
df.index = df['Timestamp']
df = df.resample('D').mean()

train['Timestamp'] = pd.to_datetime(train['Datetime'], format='%Y/%m/%d')
train.index = train['Timestamp']
train = train.resample('D').mean()

test['Timestamp'] = pd.to_datetime(test['Datetime'], format='%Y/%m/%d')
test.index = test['Timestamp']
test = test.resample('D').mean()

# 坐标轴刻入
train.Count.plot(figsize=(15, 8), title='Daily Train', fontsize=14)
test.Count.plot(figsize=(15, 8), title='Daily Test', fontsize=14)
# plt.show()

# 自回归移动平均 预测模型
y_hat_avg = test.copy()
fit1 = sm.tsa.statespace.SARIMAX(train.Count, order=(2, 1, 4), seasonal_order=(0, 1, 1, 7)).fit()
y_hat_avg['SARIMA'] = fit1.predict(start="2022/7/31", end="2022/8/10", dynamic=True)
plt.figure(figsize=(16, 8))
plt.plot(train['Count'], label='Train')
plt.plot(test['Count'], label='Test')
plt.plot(y_hat_avg['SARIMA'], label='xue xi hou')
plt.legend(loc='best')
plt.show()

# 季节性 预测模型
# y_hat_avg = test.copy()
# fit1 = ExponentialSmoothing(np.asarray(train['Count']), seasonal_periods=7, trend='add', seasonal='add', ).fit()
# y_hat_avg['Holt_Winter'] = fit1.forecast(len(test))
# plt.figure(figsize=(16, 8))
# plt.plot(train['Count'], label='Train')
# plt.plot(test['Count'], label='Test')
# plt.plot(y_hat_avg['Holt_Winter'], label='xue xi hou')
# plt.legend(loc='best')
# plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

总结

提示：这里对文章进行总结：
例如：以上就是今天要讲的内容。

相关阅读:
Java入门教程(11) ——基本数据类型
 学术大神推荐的好用科研工具
 依赖项安全检测新利器：Scorecard API
借助SpotBugs将程序错误扼杀在摇篮中
 学编程少走弯路
 go语法速查手册
 .NET Core 实现后台任务（定时任务）Longbow.Tasks 组件（三）
【每周CV论文推荐】初学模型蒸馏值得阅读的文章
 区块链技术与应用 - 学习笔记2【密码学基础】
「PAT乙级真题解析」Basic Level 1106 2019数列 (问题分析+完整步骤+伪代码描述+提交通过代码)
原文地址：https://blog.csdn.net/weixin_48518621/article/details/126330327