flyfish
线性回归(Linear Regression)的方程式
y
=
w
x
+
b
y = wx+b
y=wx+b
先写一小段代码,用于验证整个手算过程
import torch
from torch import nn
from torch import optim
import numpy as np
from matplotlib import pyplot as plt
from torch.nn.parameter import Parameter
#定义数据
x = torch.linspace(1,3,3).reshape(3,1)
y = x*2+1
print(x)
print(y)
#定义模型
class LinearRegression(nn.Module):
def __init__(self):
super(LinearRegression,self).__init__()
self.linear = nn.Linear(1,1)
self.linear.weight=Parameter(torch.tensor([[0.2055]]))
self.linear.bias=Parameter(torch.tensor([0.7159]))
def forward(self, x):
result = self.linear(x)
print("weight:",self.linear.weight)
print("bias:",self.linear.bias)
return result
learning_rate = 0.02
epochs = 500
model = LinearRegression()
#损失函数 loss function
criterion = nn.MSELoss()
#优化器
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
#训练模型
for i in range(epochs):
y_hat = model(x)#预测值
print("y_hat:",y_hat)
loss = criterion(y,y_hat) #计算损失
optimizer.zero_grad() #梯度归零
loss.backward() #计算梯度
print("weight = ",model.linear.weight)
print("weight.grad = ",model.linear.weight.grad)
print("bias = ",model.linear.bias)
print("bias.grad = ",model.linear.bias.grad)
optimizer.step()#更新梯度
if (i+1) % 20 == 0:
print(f"loss: {loss:>9f} [{i:>5d}/{epochs:>5d}]")
#模型评估
model.eval()
y_hat = model(x)
plt.scatter(x.data.numpy(),y.data.numpy(),c="r")
plt.plot(x.data.numpy(), y_hat.data.numpy())

线性回归的步骤如下
1.
define data
2.
initialize
w
and
b
for i=0 ; i < epochs; i++
3.
y
^
=
w
x
+
b
4.
J
=
(
y
^
−
y
)
2
5.
Δ
w
=
0
Δ
b
=
0
6.
∂
J
∂
y
^
=
2
(
y
^
−
y
)
7.
Δ
w
=
∂
J
∂
w
=
∂
J
∂
y
^
∂
y
∂
w
=
∂
J
∂
y
^
×
x
Δ
b
=
∂
J
∂
b
=
∂
J
∂
y
^
∂
y
∂
b
=
∂
J
∂
y
^
×
1
8.
w
←
w
−
η
Δ
w
b
←
b
−
η
Δ
b
1.define data2.initialize w and b~~~for i=0 ; i < epochs; i++3.ˆy=wx+b4.J=(ˆy−y)25.Δw=0~~~Δb=06.∂J∂ˆy=2(ˆy−y)7.Δw=∂J∂w=∂J∂ˆy∂y∂w=∂J∂ˆy×x~~~Δb=∂J∂b=∂J∂ˆy∂y∂b=∂J∂ˆy×18.w←w−ηΔw~~~b←b−ηΔb
其中
x
x
x 是 训练数据,
y
y
y 是 实际值 ,
y
^
\hat{y}
y^ 是 预测值,
w
w
w 和
b
b
b 分別是 weight 和 bias 。 epochs 是 for 循环次数
具体过程如下
x = torch.linspace(1,3,3).reshape(3,1)
y = x*2+1
x 是 [ 1 , 2 , 3 ] [1,2,3] [1,2,3] , y 是 [ 3 , 5 , 7 ] [3,5,7] [3,5,7]
程序输出
tensor([[1.],
[2.],
[3.]])
tensor([[3.],
[5.],
[7.]])
因为这里要演示计算过程,所以先用固定的值初始化
self.linear.weight=Parameter(torch.tensor([[0.2055]]))
self.linear.bias=Parameter(torch.tensor([0.7159]))
程序输出
weight: Parameter containing:
tensor([[0.2055]], requires_grad=True)
bias: Parameter containing:
tensor([0.7159], requires_grad=True)
3. y ^ = w x + b \text{3.} \qquad \hat{y} = wx+b 3.y^=wx+b
将 x,w,b的数值带入上述公式,得
y
^
=
w
x
+
b
=
0.2055
[
1
2
3
]
+
0.7159
=
[
0.2055
×
1
+
0.7159
0.2055
×
2
+
0.7159
0.2055
×
3
+
0.7159
]
=
[
0.9214
1.1269
1.3324
]
ˆy=wx+b=0.2055[123]+0.7159=[0.2055×1+0.71590.2055×2+0.71590.2055×3+0.7159]=[0.92141.12691.3324]
程序输出
y_hat: tensor([
[0.9214],
[1.1269],
[1.3324]], grad_fn=)
4. J = ( y ^ − y ) 2 \text{4.} \qquad J = (\hat{y} - y )^2 4.J=(y^−y)2
J
=
(
y
^
−
y
)
2
=
[
(
0.9214
−
3
)
2
(
1.1269
−
5
)
2
(
1.3325
−
7
)
2
]
=
[
4.32057796
15.00090361
32.12168976
]
J=(ˆy−y)2=[(0.9214−3)2(1.1269−5)2(1.3325−7)2]=[4.3205779615.0009036132.12168976]
计算累加和后再求平均
J
=
4.32057796
+
15.00090361
+
32.12168976
3
≈
17.14772
J = \frac{4.32057796 + 15.00090361 + 32.12168976}{3} \approx 17.14772
J=34.32057796+15.00090361+32.12168976≈17.14772
程序输出
loss: 17.147722
5.
Δ
w
=
0
Δ
b
=
0
5.Δw=0Δb=0
实际要累加和后求平均,这一步放到最后算
6.
∂
J
∂
y
^
=
2
(
y
^
−
y
)
\text{6.} \qquad \frac{\partial J}{\partial \hat{y}} = 2(\hat{y} - y)
6.∂y^∂J=2(y^−y)
∂
J
∂
y
^
=
2
(
y
^
−
y
)
=
[
2
(
0.9214
−
3
)
2
(
1.1269
−
5
)
2
(
1.3324
−
7
)
]
=
[
−
4.1572
−
7.7462
−
11.3352
]
∂J∂ˆy=2(ˆy−y)=[2(0.9214−3)2(1.1269−5)2(1.3324−7)]=[−4.1572−7.7462−11.3352]
7.
Δ
w
=
∂
J
∂
w
=
∂
J
∂
y
^
∂
y
∂
w
=
∂
J
∂
y
^
×
x
Δ
b
=
∂
J
∂
b
=
∂
J
∂
y
^
∂
y
∂
b
=
∂
J
∂
y
^
×
1
7.Δw=∂J∂w=∂J∂ˆy∂y∂w=∂J∂ˆy×xΔb=∂J∂b=∂J∂ˆy∂y∂b=∂J∂ˆy×1
Δ
w
=
∂
J
∂
y
^
×
x
=
[
−
4.1572
×
1
−
7.7462
×
2
−
11.3352
×
3
]
=
[
−
4.1572
−
15.4924
−
34.0056
]
Δw=∂J∂ˆy×x=[−4.1572×1−7.7462×2−11.3352×3]=[−4.1572−15.4924−34.0056]
计算累加和后求平均
实际完整的表达式是
∂
J
∂
y
^
=
[
∂
J
∂
y
^
1
∂
J
∂
y
^
2
∂
J
∂
y
^
3
]
=
2
3
[
y
^
1
−
y
1
y
^
2
−
y
2
y
^
3
−
y
3
]
\frac{\partial J}{\partial \hat{y} } = [∂J∂ˆy1∂J∂ˆy2∂J∂ˆy3]
Δ
w
=
−
4.1572
+
−
15.4924
+
−
34.0056
3
≈
−
17.8851
\Delta_w = \frac{-4.1572+-15.4924 + -34.0056}{3} \approx-17.8851
Δw=3−4.1572+−15.4924+−34.0056≈−17.8851
Δ b = − 4.1572 + − 7.7462 + − 11.3352 3 ≈ − 7.7462 \Delta_b = \frac{-4.1572 + -7.7462 + -11.3352}{3} \approx -7.7462 Δb=3−4.1572+−7.7462+−11.3352≈−7.7462
程序输出是
weight = Parameter containing:
tensor([[0.2055]], requires_grad=True)
weight.grad = tensor([[-17.8851]])
bias = Parameter containing:
tensor([0.7159], requires_grad=True)
bias.grad = tensor([-7.7462])
8.
w
←
w
−
η
Δ
w
b
←
b
−
η
Δ
b
8.w←w−ηΔwb←b−ηΔb
w
−
η
Δ
w
=
0.2055
−
0.02
×
(
−
17.8851
)
=
0.563202
b
−
η
Δ
b
=
0.7159
−
0.02
×
(
−
7.7461
)
=
0.8708
w−ηΔw=0.2055−0.02×(−17.8851)=0.563202b−ηΔb=0.7159−0.02×(−7.7461)=0.8708
程序输出
weight: Parameter containing:
tensor([[0.5632]], requires_grad=True)
bias: Parameter containing:
tensor([0.8708], requires_grad=True)
整个手算过程与程序输出一致