• Python数据分析与机器学习48-案例之贷款利润最大化


    一. 数据源介绍

    下载地址:
    https://www.lendingclub.com/info/prospectus.action

    LoanStats3a.csv
    这是一个贷款的数据
    image.png

    image.png

    二. 数据预处理

    我们从上一步的数据中可以看到数据存在的一些问题:

    1. 首行是URL,第二行才开始是表格数据
    2. 存在很多的空列
    3. 有些id、member_id 这类流水id,有url、desc这些描述性字段 等一些对于分析意义不大的数据
    4. 有部分列只有唯一值,对于分析意义不大

    2.1 删除空列

    代码:

    import pandas as pd
    
    # 跳过首行读取数据
    loans_2007 = pd.read_csv('E:/file/LoanStats3a.csv', skiprows=1)
    
    # 删除一些空列及不需要的列
    half_count = len(loans_2007) / 2
    loans_2007 = loans_2007.dropna(thresh=half_count, axis=1)
    loans_2007 = loans_2007.drop(['desc', 'url'],axis=1)
    
    # 将处理完的数据写入到文件
    loans_2007.to_csv('E:/file/loans_2007.csv', index=False)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    2.2 删除不必要的列并对字符数据进行编码

    代码:

    import pandas as pd
    
    loans_2007 = pd.read_csv("E:/file/loans_2007.csv")
    
    loans_2007 = loans_2007.drop(["id", "member_id", "funded_amnt", "funded_amnt_inv", "grade", "sub_grade", "emp_title", "issue_d"], axis=1)
    loans_2007 = loans_2007.drop(["zip_code", "out_prncp", "out_prncp_inv", "total_pymnt", "total_pymnt_inv", "total_rec_prncp"], axis=1)
    loans_2007 = loans_2007.drop(["total_rec_int", "total_rec_late_fee", "recoveries", "collection_recovery_fee", "last_pymnt_d", "last_pymnt_amnt"], axis=1)
    
    print(loans_2007['loan_status'].value_counts())
    
    # 将字符类型转换为数值
    loans_2007 = loans_2007[(loans_2007['loan_status'] == "Fully Paid") | (loans_2007['loan_status'] == "Charged Off")]
    
    status_replace = {
        "loan_status" : {
            "Fully Paid": 1,
            "Charged Off": 0,
        }
    }
    
    loans_2007 = loans_2007.replace(status_replace)
    
    #将只包含唯一值的列进行删除
    orig_columns = loans_2007.columns
    drop_columns = []
    for col in orig_columns:
        col_series = loans_2007[col].dropna().unique()
        if len(col_series) == 1:
            drop_columns.append(col)
    loans_2007 = loans_2007.drop(drop_columns, axis=1)
    print(drop_columns)
    print(loans_2007.shape)
    
    # 将处理完的数据写入到一个新文件
    loans_2007.to_csv('E:/file/filtered_loans_2007.csv', index=False)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35

    2.3 删除空值较多的列并对字符列进行编码

    代码:

    import pandas as pd
    
    # 查看没一列 空值的个数,并将空值较多的列进行删除
    loans = pd.read_csv('E:/file/filtered_loans_2007.csv')
    null_counts = loans.isnull().sum()
    print(null_counts)
    
    loans = loans.drop("pub_rec_bankruptcies", axis=1)
    loans = loans.dropna(axis=0)
    
    # 查看列的数据类型,并将字符转为数值类型
    object_columns_df = loans.select_dtypes(include=["object"])
    print(object_columns_df.iloc[0])
    
    cols = ['home_ownership', 'verification_status', 'emp_length', 'term', 'addr_state']
    #for c in cols:
    #    print(loans[c].value_counts())
    
    # 这两列意义类似,删除其中之一
    print(loans["purpose"].value_counts())
    print(loans["title"].value_counts())
    
    mapping_dict = {
        "emp_length": {
            "10+ years": 10,
            "9 years": 9,
            "8 years": 8,
            "7 years": 7,
            "6 years": 6,
            "5 years": 5,
            "4 years": 4,
            "3 years": 3,
            "2 years": 2,
            "1 year": 1,
            "< 1 year": 0,
            "n/a": 0
        }
    }
    loans = loans.drop(["last_credit_pull_d", "earliest_cr_line", "addr_state", "title"], axis=1)
    loans["int_rate"] = loans["int_rate"].str.rstrip("%").astype("float")
    loans["revol_util"] = loans["revol_util"].str.rstrip("%").astype("float")
    loans = loans.replace(mapping_dict)
    
    # 进行独热编码
    cat_columns = ["home_ownership", "verification_status", "emp_length", "purpose", "term"]
    dummy_df = pd.get_dummies(loans[cat_columns])
    loans = pd.concat([loans, dummy_df], axis=1)
    loans = loans.drop(cat_columns, axis=1)
    loans = loans.drop("pymnt_plan", axis=1)
    
    # 将处理完成的数据写入到文件
    loans.to_csv('E:/file/cleaned_loans2007.csv', index=False)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52

    2.4 预处理完成的数据

    代码:

    import pandas as pd
    
    # 读取数据源
    loans = pd.read_csv("E:/file/cleaned_loans2007.csv")
    print(loans.info())
    
    • 1
    • 2
    • 3
    • 4
    • 5

    测试记录:
    如下可知,我们总共选择了37个特征,每个特征都是数值类型,且没有空值。

    
    RangeIndex: 38428 entries, 0 to 38427
    Data columns (total 37 columns):
     #   Column                               Non-Null Count  Dtype  
    ---  ------                               --------------  -----  
     0   loan_amnt                            38428 non-null  float64
     1   int_rate                             38428 non-null  float64
     2   installment                          38428 non-null  float64
     3   annual_inc                           38428 non-null  float64
     4   loan_status                          38428 non-null  int64  
     5   dti                                  38428 non-null  float64
     6   delinq_2yrs                          38428 non-null  float64
     7   inq_last_6mths                       38428 non-null  float64
     8   open_acc                             38428 non-null  float64
     9   pub_rec                              38428 non-null  float64
     10  revol_bal                            38428 non-null  float64
     11  revol_util                           38428 non-null  float64
     12  total_acc                            38428 non-null  float64
     13  home_ownership_MORTGAGE              38428 non-null  int64  
     14  home_ownership_NONE                  38428 non-null  int64  
     15  home_ownership_OTHER                 38428 non-null  int64  
     16  home_ownership_OWN                   38428 non-null  int64  
     17  home_ownership_RENT                  38428 non-null  int64  
     18  verification_status_Not Verified     38428 non-null  int64  
     19  verification_status_Source Verified  38428 non-null  int64  
     20  verification_status_Verified         38428 non-null  int64  
     21  purpose_car                          38428 non-null  int64  
     22  purpose_credit_card                  38428 non-null  int64  
     23  purpose_debt_consolidation           38428 non-null  int64  
     24  purpose_educational                  38428 non-null  int64  
     25  purpose_home_improvement             38428 non-null  int64  
     26  purpose_house                        38428 non-null  int64  
     27  purpose_major_purchase               38428 non-null  int64  
     28  purpose_medical                      38428 non-null  int64  
     29  purpose_moving                       38428 non-null  int64  
     30  purpose_other                        38428 non-null  int64  
     31  purpose_renewable_energy             38428 non-null  int64  
     32  purpose_small_business               38428 non-null  int64  
     33  purpose_vacation                     38428 non-null  int64  
     34  purpose_wedding                      38428 non-null  int64  
     35  term_ 36 months                      38428 non-null  int64  
     36  term_ 60 months                      38428 non-null  int64  
    dtypes: float64(12), int64(25)
    memory usage: 10.8 MB
    None
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45

    三. 使用逻辑回归模型进行分析

    3.1 简单的逻辑回归模型

    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import cross_val_predict, KFold
    
    # 读取数据源
    loans = pd.read_csv("E:/file/cleaned_loans2007.csv")
    #print(loans.info())
    
    # 使用逻辑回归建模分析
    lr = LogisticRegression()
    cols = loans.columns
    train_cols = cols.drop("loan_status")
    features = loans[train_cols]
    target = loans["loan_status"]
    
    # 使用交叉验证
    kf = KFold(n_splits=3, random_state=None)
    predictions = cross_val_predict(lr, features, target, cv=kf)
    predictions = pd.Series(predictions)
    
    # False positives.
    fp_filter = (predictions == 1) & (loans["loan_status"] == 0)
    fp = len(predictions[fp_filter])
    
    # True positives.
    tp_filter = (predictions == 1) & (loans["loan_status"] == 1)
    tp = len(predictions[tp_filter])
    
    # False negatives.
    fn_filter = (predictions == 0) & (loans["loan_status"] == 1)
    fn = len(predictions[fn_filter])
    
    # True negatives
    tn_filter = (predictions == 0) & (loans["loan_status"] == 0)
    tn = len(predictions[tn_filter])
    
    # Rates
    tpr = tp / float((tp + fn))
    fpr = fp / float((fp + tn))
    
    print(tpr)
    print(fpr)
    print(predictions[:20])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44

    测试记录:
    我们可以看到tpr和fpr都是很高的数值,但是我们想贷款利润最大化,就要求 tpr越大越好,fpr越小越好。
    这样的结果是达不到我们的要求的,原来是因为样本分布不均匀的原因,正样本数远远大于负样本数。

    0.9992428143077808
    0.9983367214932545
    0     1
    1     1
    2     1
    3     1
    4     1
    5     1
    6     1
    7     1
    8     1
    9     1
    10    1
    11    1
    12    1
    13    1
    14    1
    15    1
    16    1
    17    1
    18    1
    19    1
    dtype: int64
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23

    3.2 逻辑回归+改变样本权重

    对比上一章节,我这边只修改了一行代码,就是训练模型时候的样本权重值。

    代码:

    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import cross_val_predict, KFold
    
    # 读取数据源
    loans = pd.read_csv("E:/file/cleaned_loans2007.csv")
    #print(loans.info())
    
    # 使用逻辑回归建模分析,改变正负样本的权重项
    lr = LogisticRegression(class_weight="balanced")
    cols = loans.columns
    train_cols = cols.drop("loan_status")
    features = loans[train_cols]
    target = loans["loan_status"]
    
    # 使用交叉验证
    kf = KFold(n_splits=3, random_state=None)
    predictions = cross_val_predict(lr, features, target, cv=kf)
    predictions = pd.Series(predictions)
    
    # False positives.
    fp_filter = (predictions == 1) & (loans["loan_status"] == 0)
    fp = len(predictions[fp_filter])
    
    # True positives.
    tp_filter = (predictions == 1) & (loans["loan_status"] == 1)
    tp = len(predictions[tp_filter])
    
    # False negatives.
    fn_filter = (predictions == 0) & (loans["loan_status"] == 1)
    fn = len(predictions[fn_filter])
    
    # True negatives
    tn_filter = (predictions == 0) & (loans["loan_status"] == 0)
    tn = len(predictions[tn_filter])
    
    # Rates
    tpr = tp / float((tp + fn))
    fpr = fp / float((fp + tn))
    
    print(tpr)
    print(fpr)
    print(predictions[:20])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44

    测试记录:
    tpr 0.5559,这个比较低了
    fpr 0.37,1 - 0.37 = 0.63,模型的效果只能分辨出63%的负样本数据。

    0.5559257352273071
    0.37737941230826094
    0     0
    1     0
    2     0
    3     1
    4     1
    5     0
    6     0
    7     0
    8     0
    9     1
    10    1
    11    0
    12    0
    13    1
    14    0
    15    0
    16    1
    17    1
    18    1
    19    0
    dtype: int64
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23

    四. 随机森林

    4.1 初步的随机森林

    代码:

    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import cross_val_predict, KFold
    from sklearn.ensemble import RandomForestClassifier
    
    # 读取数据源
    loans = pd.read_csv("E:/file/cleaned_loans2007.csv")
    #print(loans.info())
    
    cols = loans.columns
    train_cols = cols.drop("loan_status")
    features = loans[train_cols]
    target = loans["loan_status"]
    
    # 使用随机森林进行建模
    rf = RandomForestClassifier(n_estimators=10,class_weight="balanced", random_state=1)
    #print help(RandomForestClassifier)
    kf = KFold(n_splits=3, random_state=None)
    predictions = cross_val_predict(rf, features, target, cv=kf)
    predictions = pd.Series(predictions)
    
    # False positives.
    fp_filter = (predictions == 1) & (loans["loan_status"] == 0)
    fp = len(predictions[fp_filter])
    
    # True positives.
    tp_filter = (predictions == 1) & (loans["loan_status"] == 1)
    tp = len(predictions[tp_filter])
    
    # False negatives.
    fn_filter = (predictions == 0) & (loans["loan_status"] == 1)
    fn = len(predictions[fn_filter])
    
    # True negatives
    tn_filter = (predictions == 0) & (loans["loan_status"] == 0)
    tn = len(predictions[tn_filter])
    
    # Rates
    tpr = tp / float((tp + fn))
    fpr = fp / float((fp + tn))
    
    print(tpr)
    print(fpr)
    print(predictions[:20])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45

    测试记录:
    tpr 高,fpr也高,所以这个模型效果也不理想

    0.9763152315473846
    0.9406763999260765
    0     1
    1     0
    2     1
    3     1
    4     1
    5     1
    6     1
    7     1
    8     1
    9     1
    10    1
    11    1
    12    1
    13    1
    14    1
    15    1
    16    1
    17    1
    18    1
    19    1
    dtype: int64
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23

    4.2 调整参数

    我们使用for循环去遍历参数,查看最优的参数

    代码:

    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import cross_val_predict, KFold
    from sklearn.ensemble import RandomForestClassifier
    
    # 读取数据源
    loans = pd.read_csv("E:/file/cleaned_loans2007.csv")
    #print(loans.info())
    
    cols = loans.columns
    train_cols = cols.drop("loan_status")
    features = loans[train_cols]
    target = loans["loan_status"]
    
    rf_trees = [2,5,8,10,12,15]
    rf_depthe = [3,4,5,6,7,8,9,10,11,12,13,14,15]
    
    result = {}
    results = []
    results2 = []
    
    
    for t in rf_trees:
        for d in rf_depthe:
            # 使用随机森林进行建模
            rf = RandomForestClassifier(n_estimators=t, max_depth=d, class_weight="balanced", random_state=1)
            #print help(RandomForestClassifier)
            kf = KFold(n_splits=3, random_state=None)
            predictions = cross_val_predict(rf, features, target, cv=kf)
            predictions = pd.Series(predictions)
    
            # False positives.
            fp_filter = (predictions == 1) & (loans["loan_status"] == 0)
            fp = len(predictions[fp_filter])
    
            # True positives.
            tp_filter = (predictions == 1) & (loans["loan_status"] == 1)
            tp = len(predictions[tp_filter])
    
            # False negatives.
            fn_filter = (predictions == 0) & (loans["loan_status"] == 1)
            fn = len(predictions[fn_filter])
    
            # True negatives
            tn_filter = (predictions == 0) & (loans["loan_status"] == 0)
            tn = len(predictions[tn_filter])
    
            # Rates
            tpr = tp / float((tp + fn))
            fpr = fp / float((fp + tn))
    
            #print("trees:" + str(t) + "   " + "depths:" + str(d))
    
            #print('trp:'+ str(tpr))
            #print('fpr:'+ str(fpr))
            #print("##########################")
    
    
            result['trees'] = t;
            result['depths'] = d;
            result['trp'] = tpr;
            result['fpr'] = fpr;
            # json字典在循环外 顾需要加上copy
            results.append(result.copy())
    
    
    for r in results:
        print(r)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69

    测试记录:

    {'trees': 2, 'depths': 3, 'trp': 0.6363691431686707, 'fpr': 0.40713361670670856}
    {'trees': 2, 'depths': 4, 'trp': 0.6348244843565436, 'fpr': 0.40177416374052854}
    {'trees': 2, 'depths': 5, 'trp': 0.6153193809249781, 'fpr': 0.3962299020513768}
    {'trees': 2, 'depths': 6, 'trp': 0.6316443044492231, 'fpr': 0.40620957309184996}
    {'trees': 2, 'depths': 7, 'trp': 0.6363994305963595, 'fpr': 0.42247274071336166}
    {'trees': 2, 'depths': 8, 'trp': 0.6567525820032105, 'fpr': 0.4514877102199224}
    {'trees': 2, 'depths': 9, 'trp': 0.6668382954235696, 'fpr': 0.46331546849011274}
    {'trees': 2, 'depths': 10, 'trp': 0.6982463579368204, 'fpr': 0.5139530585843651}
    {'trees': 2, 'depths': 11, 'trp': 0.691552836417603, 'fpr': 0.5058214747736093}
    {'trees': 2, 'depths': 12, 'trp': 0.742375140079353, 'fpr': 0.5989650711513583}
    {'trees': 2, 'depths': 13, 'trp': 0.7715722203713239, 'fpr': 0.6409166512659398}
    {'trees': 2, 'depths': 14, 'trp': 0.7682103158978708, 'fpr': 0.6521899833672149}
    {'trees': 2, 'depths': 15, 'trp': 0.7799012629857346, 'fpr': 0.6686379597116984}
    {'trees': 5, 'depths': 3, 'trp': 0.6375806402762213, 'fpr': 0.39807798928109406}
    {'trees': 5, 'depths': 4, 'trp': 0.6708362358784868, 'fpr': 0.42912585474034376}
    {'trees': 5, 'depths': 5, 'trp': 0.6424872035618014, 'fpr': 0.386065422287932}
    {'trees': 5, 'depths': 6, 'trp': 0.6694733016324923, 'fpr': 0.4245056366660506}
    {'trees': 5, 'depths': 7, 'trp': 0.6668382954235696, 'fpr': 0.42506006283496583}
    {'trees': 5, 'depths': 8, 'trp': 0.6898567404670322, 'fpr': 0.4588800591387914}
    {'trees': 5, 'depths': 9, 'trp': 0.7132386346427598, 'fpr': 0.4882646460912955}
    {'trees': 5, 'depths': 10, 'trp': 0.7385892116182573, 'fpr': 0.5316946959896507}
    {'trees': 5, 'depths': 11, 'trp': 0.7558227579731653, 'fpr': 0.5693956754758824}
    {'trees': 5, 'depths': 12, 'trp': 0.7921070963443074, 'fpr': 0.6135649602661246}
    {'trees': 5, 'depths': 13, 'trp': 0.8204258412333041, 'fpr': 0.6743670301238218}
    {'trees': 5, 'depths': 14, 'trp': 0.8450495199442711, 'fpr': 0.7048604694141564}
    {'trees': 5, 'depths': 15, 'trp': 0.8593148983856801, 'fpr': 0.7240805766032157}
    {'trees': 8, 'depths': 3, 'trp': 0.6597510373443983, 'fpr': 0.41323230456477544}
    {'trees': 8, 'depths': 4, 'trp': 0.6633249538116728, 'fpr': 0.41378673073369066}
    {'trees': 8, 'depths': 5, 'trp': 0.6701093376139564, 'fpr': 0.4130474958418037}
    {'trees': 8, 'depths': 6, 'trp': 0.6695035890601811, 'fpr': 0.4165588615782665}
    {'trees': 8, 'depths': 7, 'trp': 0.667444043977345, 'fpr': 0.4204398447606727}
    {'trees': 8, 'depths': 8, 'trp': 0.693248932368174, 'fpr': 0.4553686934023286}
    {'trees': 8, 'depths': 9, 'trp': 0.7205379047157525, 'fpr': 0.49011273332101274}
    {'trees': 8, 'depths': 10, 'trp': 0.7472514159372444, 'fpr': 0.5276289040842728}
    {'trees': 8, 'depths': 11, 'trp': 0.770784747251416, 'fpr': 0.5716133801515432}
    {'trees': 8, 'depths': 12, 'trp': 0.8038586182875488, 'fpr': 0.612640916651266}
    {'trees': 8, 'depths': 13, 'trp': 0.840264106369446, 'fpr': 0.6978377379412308}
    {'trees': 8, 'depths': 14, 'trp': 0.8618287548838477, 'fpr': 0.7238957678802439}
    {'trees': 8, 'depths': 15, 'trp': 0.8847866250719326, 'fpr': 0.7567917205692108}
    {'trees': 10, 'depths': 3, 'trp': 0.6630826543901627, 'fpr': 0.42080946220661614}
    {'trees': 10, 'depths': 4, 'trp': 0.6703819244631554, 'fpr': 0.4178525226390686}
    {'trees': 10, 'depths': 5, 'trp': 0.6711693975830633, 'fpr': 0.4126778783958603}
    {'trees': 10, 'depths': 6, 'trp': 0.6844958657661205, 'fpr': 0.4317131768619479}
    {'trees': 10, 'depths': 7, 'trp': 0.6763485477178424, 'fpr': 0.4245056366660506}
    {'trees': 10, 'depths': 8, 'trp': 0.6989126813459733, 'fpr': 0.4500092404361486}
    {'trees': 10, 'depths': 9, 'trp': 0.7281400490656329, 'fpr': 0.49454814267233416}
    {'trees': 10, 'depths': 10, 'trp': 0.7572765545022261, 'fpr': 0.5348364442801701}
    {'trees': 10, 'depths': 11, 'trp': 0.7821425326347033, 'fpr': 0.5806690075771577}
    {'trees': 10, 'depths': 12, 'trp': 0.8144289305509284, 'fpr': 0.6204028830160784}
    {'trees': 10, 'depths': 13, 'trp': 0.8475633764424387, 'fpr': 0.6995010164479764}
    {'trees': 10, 'depths': 14, 'trp': 0.872156767725717, 'fpr': 0.7333210127518018}
    {'trees': 10, 'depths': 15, 'trp': 0.8946300390707818, 'fpr': 0.766771391609684}
    {'trees': 12, 'depths': 3, 'trp': 0.6893418541963231, 'fpr': 0.44428017002402515}
    {'trees': 12, 'depths': 4, 'trp': 0.6845867280491867, 'fpr': 0.4278321936795417}
    {'trees': 12, 'depths': 5, 'trp': 0.6869491474089106, 'fpr': 0.42857142857142855}
    {'trees': 12, 'depths': 6, 'trp': 0.6898567404670322, 'fpr': 0.43522454259841065}
    {'trees': 12, 'depths': 7, 'trp': 0.6869491474089106, 'fpr': 0.4320827943078913}
    {'trees': 12, 'depths': 8, 'trp': 0.7059090771420783, 'fpr': 0.45721678063204585}
    {'trees': 12, 'depths': 9, 'trp': 0.7362570796862222, 'fpr': 0.49768989096285343}
    {'trees': 12, 'depths': 10, 'trp': 0.7641518005875761, 'fpr': 0.5427832193679542}
    {'trees': 12, 'depths': 11, 'trp': 0.793470030590302, 'fpr': 0.5893550175568287}
    {'trees': 12, 'depths': 12, 'trp': 0.8231819971529818, 'fpr': 0.6318610238403253}
    {'trees': 12, 'depths': 13, 'trp': 0.8541357482509011, 'fpr': 0.7039364257992977}
    {'trees': 12, 'depths': 14, 'trp': 0.8763970076021443, 'fpr': 0.7311033080761412}
    {'trees': 12, 'depths': 15, 'trp': 0.9027773571190599, 'fpr': 0.777490297542044}
    {'trees': 15, 'depths': 3, 'trp': 0.699215555622861, 'fpr': 0.457771206800961}
    {'trees': 15, 'depths': 4, 'trp': 0.6887966804979253, 'fpr': 0.4387359083348734}
    {'trees': 15, 'depths': 5, 'trp': 0.6972165853954023, 'fpr': 0.43688782110515617}
    {'trees': 15, 'depths': 6, 'trp': 0.6941575551988369, 'fpr': 0.43836629088893}
    {'trees': 15, 'depths': 7, 'trp': 0.6989126813459733, 'fpr': 0.4431713176861948}
    {'trees': 15, 'depths': 8, 'trp': 0.7145409940333768, 'fpr': 0.4668268342265755}
    {'trees': 15, 'depths': 9, 'trp': 0.7413150801102463, 'fpr': 0.5063759009425245}
    {'trees': 15, 'depths': 10, 'trp': 0.7693612381500439, 'fpr': 0.5494363333949362}
    {'trees': 15, 'depths': 11, 'trp': 0.7984977435866372, 'fpr': 0.5980410275364997}
    {'trees': 15, 'depths': 12, 'trp': 0.8308447163582396, 'fpr': 0.6423951210497135}
    {'trees': 15, 'depths': 13, 'trp': 0.8611018566193174, 'fpr': 0.709111070042506}
    {'trees': 15, 'depths': 14, 'trp': 0.8829390919829179, 'fpr': 0.7397893180558123}
    {'trees': 15, 'depths': 15, 'trp': 0.9112578368719144, 'fpr': 0.7952319349473295}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78

    参考:

    1. https://study.163.com/course/introduction.htm?courseId=1003590004#/courseDetail?tab=1
  • 相关阅读:
    [Jdk版本不一致问题 ]终端查看jdk版本不一致
    【Linux】Qt Remote之Remote开发环境搭建填坑小记
    pandas
    MySQL——使用mysqldump命令备份
    :style在列表中动态配置背景图片
    基于单片机火灾报警器仿真设计
    招银网络一面:AOP 了解吗?有什么用?切面执行顺序如何控制?
    《Python编程:从入门到实战》学习笔记(第2版) 第1-2章 起步&变量和简单数据类型
    Swagger文档生成操作SOP
    k8s实战案例之基于StatefulSet控制器运行MySQL一主多从
  • 原文地址:https://blog.csdn.net/u010520724/article/details/126222883