人工智能-线性回归2--房价预测、欠拟合过拟合、正则化、模型保存加载

人工智能-线性回归1–损失函数、正规方程、梯度下降法
 人工智能-线性回归2–房价预测、欠拟合过拟合、正则化、模型保存加载

7，案例：波士顿房价预测

回归性能评估MSE

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression,SGDRegressor
from sklearn.meyrics import mean_squared_error

#线性回归：正规方程
def linear_model():
	#1，获取数据
	boson = load_boston()
	print(boston)
	#2，数据基本处理
	#2.1 分割数据
	x_train,x_test,y_train.y_test = 			 train_test_split(boston.data,boston.target,test_size=0.2)
	#3，特征工程-标准化
	transfer = StandardScaler()
	x_train = transfer.fit_transform(x_train)
	x_test = transfer.fit_transform(x_test)

	##4，机器学习-线性回归
	estimator = LinearRegression()
	estimator.fit(x_train,y_train)
	print('这个模型的偏置是：\n',estimator.intercept_)
	print('这个模型的系数是：\n',estimator.coef_)
	#5，模型评估
	#5.1 预测值
	y_pre = estimator.predict(x_test)
	print('预测值是：\n',y_pre)
	#5.2 均方误差
	ret = mean_squared_error(y_test,y_pre)
	print('均方误差是：\n',ret)

linear_model()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

和上面的区别就是，线性回归调用模型名不同，其余代码一样

from sklearn.linear_model import SGDRegressor
#线性回归：梯度下降法
def linear_model2():
	#1，获取数据
	boson = load_boston()
	print(boston)
	#2，数据基本处理
	#2.1 分割数据
	x_train,x_test,y_train.y_test = 			 train_test_split(boston.data,boston.target,test_size=0.2)
	#3，特征工程-标准化
	transfer = StandardScaler()
	x_train = transfer.fit_transform(x_train)
	x_test = transfer.fit_transform(x_test)

	##4，机器学习-线性回归
	#estimator = SGDRegressor(max_iter = 1000,learning_rate='constant',eta0=1)
	estimator = SGDRegressor(max_iter = 1000)
	estimator.fit(x_train,y_train)
	print('这个模型的偏置是：\n',estimator.intercept_)
	print('这个模型的系数是：\n',estimator.coef_)
	#5，模型评估
	#5.1 预测值
	y_pre = estimator.predict(x_test)
	print('预测值是：\n',y_pre)
	#5.2 均方误差
	ret = mean_squared_error(y_test,y_pre)
	print('均方误差是：\n',ret)

linear_model2()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

8，欠拟合与过拟合

过拟合—模型过于复杂
欠拟合—模型过于简单

欠拟合

原因：学习到数据的特征过少
解决：添加其他特征项，添加多项式特征

欠拟合

原因：原始特征太多，存在一些嘈杂特征
解决：重新清洗数据、增大数据的训练量、正则化、减少特征维度，防止维灾难

正则化

正则化：解决回归过拟合问题，数据提供的特征有些影响模型复杂度或者这个特征的数据点异常较多，算法在学习时尽量减少这个特征的影响（删除某个特征的影响）
类别：

L2正则化
作用：可以使得其中一些w都很小，接近0，削弱某个特征的影响
优点：越小的参赛说明模型越简单，越简单的模型则越不容易产生过拟合
Ridge回归
L1正则化
作用：可以使得其中一些W的值直接为0，删除这个特征的影响
LASSO回归

维灾难：随着特征越来越多，一开始效果会变好，到达某顶点后，后面则会变差

9，正则化线性模型

岭回归、Lasso回归、弹性网络

10，岭回归

在这里插入图片描述
正则化力度a越大，权重系数越小
正则化力度越小，权重系数越大

from sklearn.linear_model import Ridge
#线性回归：岭回归
def linear_model3():
	#1，获取数据
	boson = load_boston()
	print(boston)
	#2，数据基本处理
	#2.1 分割数据
	x_train,x_test,y_train.y_test = 			 train_test_split(boston.data,boston.target,test_size=0.2)
	#3，特征工程-标准化
	transfer = StandardScaler()
	x_train = transfer.fit_transform(x_train)
	x_test = transfer.fit_transform(x_test)

	##4，机器学习-线性回归
	#estimator = Ridge(alpha=1.0)
	estimator = RidgeCV(alphas=(0.001,0.01,0.1,1,10,100))
	estimator.fit(x_train,y_train)
	print('这个模型的偏置是：\n',estimator.intercept_)
	print('这个模型的系数是：\n',estimator.coef_)
	#5，模型评估
	#5.1 预测值
	y_pre = estimator.predict(x_test)
	print('预测值是：\n',y_pre)
	#5.2 均方误差
	ret = mean_squared_error(y_test,y_pre)
	print('均方误差是：\n',ret)
if __name__ = '__main__':
	linear_model3()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

11，模型的保存和加载

API
在这里插入图片描述
增加个4.2

from sklearn.externals import joblib
def dump_load():
	#1，获取数据
	boson = load_boston()
	print(boston)
	#2，数据基本处理
	#2.1 分割数据
	x_train,x_test,y_train.y_test = 			 train_test_split(boston.data,boston.target,test_size=0.2)
	#3，特征工程-标准化
	transfer = StandardScaler()
	x_train = transfer.fit_transform(x_train)
	x_test = transfer.fit_transform(x_test)

	##4，机器学习-线性回归
	#4.1 模型训练
	#estimator = Ridge(alpha=1.0)
	estimator = RidgeCV(alphas=(0.001,0.01,0.1,1,10,100))
	estimator.fit(x_train,y_train)
	print('这个模型的偏置是：\n',estimator.intercept_)
	print('这个模型的系数是：\n',estimator.coef_)

	#4.2 模型保存
	joblib.dump(estimator,'./data/test.pkl')  #运行后目录上多出个test.pkl文件
	
	#5，模型评估
	#5.1 预测值
	y_pre = estimator.predict(x_test)
	print('预测值是：\n',y_pre)
	#5.2 均方误差
	ret = mean_squared_error(y_test,y_pre)
	print('均方误差是：\n',ret)
if __name__ = '__main__':
	linear_model3()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

#调用
estimator = joblib.load('./data/test.pkl')  #加载出了保存着的模型
1
2

保存模型，后缀是pkl

相关阅读:
iTOP-i.MX6ULL开发板修改 samba 配置文件
 华为数通方向HCIP-DataCom H12-831题库(单选题：221-240）
MySQL牛客网刷题3
Java-Day19 Java集合（集合框架、Collection接口、List接口及List接口实现类）
【排序算法】详解冒泡排序及其多种优化&稳定性分析
 tailwindcss安装完插件代码不提示
 创建github个人博客
 双指针_快乐数
 rxjs 关于防抖的方法列举说明
 python常用函数总结大全
原文地址：https://blog.csdn.net/Sun123234/article/details/127838683