python统计应用

1.简答题
请打开:资料–课程所用数据一-
Incomregression.csv
利用该csv文件中的数据，选择一种python编
译器编写python程序，完成以下内容:
读取数据，并选择变量中类型
为"float64" 的变量，对这些变量进行描
述性分析( 10分)
2.对.上述类型为"float64"的变量计算两两相
关系数，列出相关系数矩阵( 10分)
3.用绘图程序(可以用matplotib或其他python
第三方包)绘制MonthlyIncome, DebtRatio,
RevolvingL tilizationOfUnsecuredl ines三个变
量的3d散点图( 20分)
4.绘制Monthlyncome与DebtRatio,
Monthlyincome与
RevolvingL hizationOfUnsecuredl ines,
Monthlyincome与age,三幅2d散点图( 20分)
5.调用statsmodels模块，运用最小二乘法拟合
 线性回归模型，模型因变量为Monthlyincome
自变量为age、
RevolvingUilzationOfUnsecuredl ines、
DebtRatio,并提供所有拟合模型后的信息报告
(20分)
6.调用scikitlearn模块，仍用回归分析方法拟合
线性回归模型，模型因变量为Monthlyncome
自变量为age、
RevolvingUtlzationOfUnsecuredl ines、
DebtRatio,并进行5折交叉验证( 20分)

资源下载

import pandas as pd
import numpy as np
df = pd.read_csv('Incomregression.csv',engine='python',dtype=np.float64)
df.describe()
df.corr()

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig =plt.figure().add_subplot(111, projection = '3d')
fig.scatter(df['MonthlyIncome'], df['DebtRatio'], df['RevolvingUtilizationOfUnsecuredLines'], c = 'r', marker = '^')
fig.set_xlabel('X Label')
fig.set_ylabel('Y Label')
fig.set_zlabel('Z Label')
plt.show()

import matplotlib.pyplot as plt
fig=plt.figure()
ax1=plt.subplot(3,1,1) 
plt.plot(df['MonthlyIncome'],df['DebtRatio'])
ax1=plt.subplot(3,1,2) 
plt.plot(df['MonthlyIncome'],df['RevolvingUtilizationOfUnsecuredLines'])
ax1=plt.subplot(3,1,3) 
plt.plot(df['MonthlyIncome'],df['age'])

import statsmodels.formula.api as smf
formula = "MonthlyIncome ~ age + RevolvingUtilizationOfUnsecuredLines + DebtRatio"
model = smf.ols(formula, df)
results = model.fit()
print(results.summary())

from sklearn.linear_model import LinearRegression
from sklearn import linear_model
from sklearn.model_selection import train_test_split
x1=df[['age']]
x2=df[['RevolvingUtilizationOfUnsecuredLines']]
x3=df[['DebtRatio']]
y=df[['MonthlyIncome']]
x1_train,x1_test,y_train,y_test=train_test_split(x1,y,test_size=0.2,random_state=42)
x2_train,x2_test,y_train,y_test=train_test_split(x2,y,test_size=0.2,random_state=42)
x3_train,x3_test,y_train,y_test=train_test_split(x3,y,test_size=0.2,random_state=42)
model1 = LinearRegression()
model1.fit(x1_train, y_train)
print (model1.coef_)
print (model1.intercept_)
y_pred = model1.predict(x1_test)
model2 = LinearRegression()
model2.fit(x2_train, y_train)
print (model2.coef_)
print (model2.intercept_)
y_pred = model2.predict(x2_test)
model3 = LinearRegression()
model3.fit(x3_train, y_train)
print (model3.coef_)
print (model3.intercept_)
y_pred = model3.predict(x3_test)
from sklearn.model_selection import cross_val_predict
from sklearn import metrics
predicted=cross_val_predict(model1,x1,y,cv=5)
cross_mse=metrics.mean_squared_error(y,predicted)
cross_rmse=np.sqrt(metrics.mean_squared_error(y,predicted))
print('CROSS_MSE',cross_mse)
print('CROSS_RMSE',cross_rmse)
predicted=cross_val_predict(model2,x2,y,cv=5)
cross_mse=metrics.mean_squared_error(y,predicted)
cross_rmse=np.sqrt(metrics.mean_squared_error(y,predicted))
print('CROSS_MSE',cross_mse)
print('CROSS_RMSE',cross_rmse)
predicted=cross_val_predict(model3,x3,y,cv=5)
cross_mse=metrics.mean_squared_error(y,predicted)
cross_rmse=np.sqrt(metrics.mean_squared_error(y,predicted))
print('CROSS_MSE',cross_mse)
print('CROSS_RMSE',cross_rmse)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

相关阅读:
读书笔记：《Kubernetes：快速入门》
Python封装一个接收UDP组播的模块
Spring Cloud服务发现与注册的原理与实现
Camera ITS当中的test_lens_shading_and_color_uniformity测试
AI能否取代程序员：探讨人工智能在编程领域的角色
vue项目中进行svg图标组件封装及配置(全局引入)
游戏设计模式专栏(十三）：在Cocos游戏开发中运用责任链模式
Linux之部署Web项目到云服务器
什么是NLP-自然语言处理
2022-08-18 第六小组瞒春学习笔记

原文地址：https://blog.csdn.net/qq_15719613/article/details/127585997