假设你是一家餐厅的CEO,正在考虑开一家分店,根据该城市的人口数据预测其利润。我们拥有不通过城市对应的人口数据以及利润:ex1data.txt
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('../data/ex1data1.txt', names=['population', 'profit'])
data.head()
plt.scatter(data['population'], data['profit'], label='population')
plt.legend()
plt.show()
由于线性回归模型的样式为:
y
=
θ
0
+
θ
1
x
y = \theta_0 + \theta_1x
y=θ0+θ1x
将其转换为向量乘法就是:
y
=
[
1
x
0
1
x
1
⋮
⋮
1
x
m
]
[
θ
0
θ
1
]
y =
这就是为什么要在数据的第一列插入全1了
data.insert(0, 'ones', 1)
data.head()
# 提取data的第1列到倒数第二列
X = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
X = X.values
y = y.values
# 将一维变成二维
y = y.reshape(97, 1)
对于线性回归模型,我们假设其模型为:
h
θ
(
x
)
=
θ
0
+
θ
1
x
h_\theta(x) = \theta_0+\theta_1x
hθ(x)=θ0+θ1x
其损失函数为:
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\theta) = \frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2
J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
其中m为X的维度
# 定义损失函数
def costFunction(X, y, theta):
inner = np.power(X @ theta - y, 2)
return np.sum(inner) / (2 * len(X))
theta = np.zeros((2, 1))
cost_init = costFunction(X, y, theta)
此时计算得到的初始损失值为:32.072733877455676
梯度下降的公式为:
θ
:
=
θ
−
α
∂
∂
θ
J
(
θ
)
\theta := \theta - \alpha\frac{\partial }{\partial \theta}J(\theta)
θ:=θ−α∂θ∂J(θ)
∂
∂
θ
J
(
θ
)
=
1
m
X
T
(
h
θ
(
X
)
−
y
)
\frac{\partial }{\partial \theta}J(\theta) = \frac{1}{m}X^T(h_\theta(X) - y)
∂θ∂J(θ)=m1XT(hθ(X)−y)
其中alpha是指学习率
def gradientDescent(X, y, theta, alpha, iters):
# 每迭代100次时的损失值
costs = []
for i in range(iters):
theta = theta - (X.T @ (X @ theta - y)) * alpha / len(X)
cost = costFunction(X, y, theta)
costs.append(cost)
if i % 100 == 0:
print(f'迭代次数i={i}, 损失cost={cost}')
return theta, costs
alpha = 0.02
iters = 2000
theta, costs = gradientDescent(X, y, theta, alpha, iters)
迭代次数i=0, 损失cost=16.769642371667455
迭代次数i=100, 损失cost=5.170668092303261
迭代次数i=200, 损失cost=4.813840215803055
迭代次数i=300, 损失cost=4.640559602034057
迭代次数i=400, 损失cost=4.556412109403549
迭代次数i=500, 损失cost=4.5155489085988645
迭代次数i=600, 损失cost=4.4957051660486735
迭代次数i=700, 损失cost=4.486068766778817
迭代次数i=800, 损失cost=4.481389196347322
迭代次数i=900, 损失cost=4.479116731414093
迭代次数i=1000, 损失cost=4.478013190619409
迭代次数i=1100, 损失cost=4.477477295755764
迭代次数i=1200, 损失cost=4.477217057705422
迭代次数i=1300, 损失cost=4.47709068246386
迭代次数i=1400, 损失cost=4.477029312876825
迭代次数i=1500, 损失cost=4.476999510945953
迭代次数i=1600, 损失cost=4.476985038710984
迭代次数i=1700, 损失cost=4.476978010791015
迭代次数i=1800, 损失cost=4.476974597934661
迭代次数i=1900, 损失cost=4.476972940603823
plt.plot(np.arange(iters), costs)
plt.xlabel('iters')
plt.ylabel('cost')
plt.title('cost vs iters')
plt.show()
x = np.linspace(y.min(), y.max(), 100)
y_ = theta[0, 0] + theta[1, 0] * x
fig, ax = plt.subplots()
ax.scatter(X[:, 1], y, label='training data')
ax.plot(x, y_, 'r', label='predict')
ax.legend()
ax.set(xlabel='population', ylabel='profit')
plt.show()
假设你现在打算卖房子,想知道房子能卖多少钱。我们拥有房子面积和卧室数量以及房子价格之间的对应数据:ex1data2.txt
data = pd.read_csv('../data/ex1data2.txt', names=['size', 'bedrooms', 'price'])
data.head()
当各维度的数量级相差较大时,可以用特征归一化使梯度下降的更快而且使各维度的重要性都更重要的体现。做法就是,将每类特征减去其平均值后除以标准差
def normalize_feature(data):
return (data - data.mean()) / data.std()
data = normalize_feature(data)
data.head()
data.plot.scatter('size', 'price', label='size')
plt.show()
data.plot.scatter('bedrooms', 'price', label='bedrooms')
plt.show()
# 添加全为1的列
data.insert(0, 'ones', 1)
# 提取data的第1列到倒数第二列数据
X = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
# 将dataframe转为数组
X = X.values
y = y.values
y = y.reshape(47, 1)
对于多变量线性回归(以二变量为例),假设其模型为:
h
θ
(
x
)
=
θ
0
+
θ
1
x
1
+
θ
2
x
2
h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2x_2
hθ(x)=θ0+θ1x1+θ2x2
其损失函数为:
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\theta) = \frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2
J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
其中m为X的维度
def costFunction(X, y, theta):
inner = np.power(X @ theta - y, 2)
return np.sum(inner) / (2 * len(X))
theta = np.zeros((3, 1))
cost_init = costFunction(X, y, theta)
此时可求得初始损失值为:0.48936170212765967
多变量梯度下降函数与单变量梯度下降函数一致
# alpha--学习率
# iters--迭代次数
def gradientDescent(X, y, theta, alpha, iters, isprint=False):
costs = []
for i in range(iters):
theta = theta - (X.T @ (X @ theta - y)) * alpha / len(X)
cost = costFunction(X, y, theta)
costs.append(cost)
if i % 100 == 0:
if isprint:
print(f'迭代次数i={i}, 损失cost={cost}')
return theta, costs
# 查看不同alpha下的效果
candinate_alpha = [0.0003, 0.003, 0.03, 0.3, 0.0001, 0.001, 0.01]
iters = 2000
fig, ax = plt.subplots()
for alpha in candinate_alpha:
_, costs = gradientDescent(X, y, theta, alpha, iters)
ax.plot(np.arange(iters), costs, label=alpha)
ax.legend()
ax.set(xlabel='iters', ylabel='cost', title='cost vs iters')
plt.show()