某一类的精准率precision—其实预测为该类并且预测正确的比例。
某一类的精准率recall—其实就是该类有多少被预测出来。
(一)、逻辑斯蒂回归叫回归,但实际上它是二分类算法;也可以实现多分类算法,也就是多次利用二分类算法实现多分类。
(二) 逻辑斯蒂回归函数
函数图像如图:
所以在最后的判别方法为:
三、信用卡诈骗分类
首先介绍一下sklearn包中的分类器:
(1)导包
- %matplotlib inline
- import matplotlib.pyplot as plt
- import numpy as np
- import pandas as pd
- from sklearn.preprocessing import StandardScaler
- import imblearn
- from sklearn.metrics import classification_report
- from sklearn.model_selection import train_test_split
(2)读入数据AND数据预处理
- data=pd.read_csv('./data_picture/chapter4/creditcard.csv')
- X = data.drop('Class',axis=1)
- y = data['Class']
- X=X.drop('Time',axis=1)
- X['Amount'] = (X['Amount'] - X['Amount'].min()) /(X['Amount'].max() - X['Amount'].min())
- data.head()
结果如图:
(3)分数据集
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=1)
(4)模型训练
- from sklearn.linear_model import LogisticRegression,LogisticRegressionCV,SGDClassifier
- model = LogisticRegression(penalty='l2',random_state=33)
- model.fit(X_train, y_train)
(5)模型评价
- ypred1=model.predict(X_test) #预测测试集各样本的类别
- ypred2=model.predict_proba(X_test) #预测测试集每个样本属于各类的概率
- train_score = model.score(X_train, y_train)
- test_score = model.score(X_test, y_test)
- print('train_score=',train_score)
- print('test_score=',test_score)
- print('------------------------------------------------------')
- y_predict=model.predict(X_test)
- model_report=classification_report(y_test,y_predict)
- print(model_report)
结果如图: