• ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解


    ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

    目录

    基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

    # 1、定义数据集

    # 1.1、加载德国信用卡数据集

    #1.2、对各个变量进行EDA分析

    # 1.3、输出连续型变量的mean、std、min、3种分位数、max

    # 2、数据预处理

    # 2.1、对类别型目标变量映射成数值型变量

    # 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等

    # 2.3、筛选特征:分别基于IV、empty、corr指标

    # 2.4、分箱处理

    # 2.5、利用badrate图进一步调整分箱

    # 2.5.1、自定义调整分箱示例

    # 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图

     # 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势

     # 2.6、对分箱后的数据进行WOE转换

    # 2.7、特征选择

    # 3、模型建立、训练、评估

    # 3.1、切分训练集、测试集

    # 3.2、模型训练

    # 3.3、模型评估:F1、KS、AUC

    # 4、模型上线评估,并计算信用分

    # 4.1、评估变量的稳定性PSI:比较训练集和测试集

    # 4.2、训练集等频分箱,观测每组的区别

    # 4.3、评分卡分数变换


    相关文章
    ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解
    ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解代码实现

    基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

    # 1、定义数据集

    # 1.1、加载德国信用卡数据集

    将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。 https://archive.ics.uci.edu/ml/datasets/Statlog+

    status.of.existing.checking.accountduration.in.monthcredit.historypurposecredit.amountsavings.account.and.bondspresent.employment.sinceinstallment.rate.in.percentage.of.disposable.incomepersonal.status.and.sexother.debtors.or.guarantorspresent.residence.sincepropertyage.in.yearsother.installment.planshousingnumber.of.existing.credits.at.this.bankjobnumber.of.people.being.liable.to.provide.maintenance.fortelephoneforeign.workercreditability
    0... < 0 DM6critical account/ other credits existing (not at this bank)radio/television1169unknown/ no savings account... >= 7 years4male : divorced/separatednone4real estate67noneown2skilled employee / official1yes, registered under the customers nameyesgood
    10 <= ... < 200 DM48existing credits paid back duly till nowradio/television5951... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2real estate22noneown1skilled employee / official1noneyesbad
    2no checking account12critical account/ other credits existing (not at this bank)education2096... < 100 DM4 <= ... < 7 years2male : divorced/separatednone3real estate49noneown1unskilled - resident2noneyesgood
    3... < 0 DM42existing credits paid back duly till nowfurniture/equipment7882... < 100 DM4 <= ... < 7 years2male : divorced/separatedguarantor4building society savings agreement/ life insurance45nonefor free1skilled employee / official2noneyesgood
    4... < 0 DM24delay in paying off in the pastcar (new)4870... < 100 DM1 <= ... < 4 years3male : divorced/separatednone4unknown / no property53nonefor free2skilled employee / official2noneyesbad
    5no checking account36existing credits paid back duly till noweducation9055unknown/ no savings account1 <= ... < 4 years2male : divorced/separatednone4unknown / no property35nonefor free1unskilled - resident2yes, registered under the customers nameyesgood
    6no checking account24existing credits paid back duly till nowfurniture/equipment2835500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone4building society savings agreement/ life insurance53noneown1skilled employee / official1noneyesgood
    70 <= ... < 200 DM36existing credits paid back duly till nowcar (used)6948... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2car or other, not in attribute Savings account/bonds35nonerent1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesgood
    8no checking account12existing credits paid back duly till nowradio/television3059... >= 1000 DM4 <= ... < 7 years2male : divorced/separatednone4real estate61noneown1unskilled - resident1noneyesgood
    90 <= ... < 200 DM30critical account/ other credits existing (not at this bank)car (new)5234... < 100 DMunemployed4male : divorced/separatednone2car or other, not in attribute Savings account/bonds28noneown2management/ self-employed/ highly qualified employee/ officer1noneyesbad
    100 <= ... < 200 DM12existing credits paid back duly till nowcar (new)1295... < 100 DM... < 1 year3male : divorced/separatednone1car or other, not in attribute Savings account/bonds25nonerent1skilled employee / official1noneyesbad
    11... < 0 DM48existing credits paid back duly till nowbusiness4308... < 100 DM... < 1 year3male : divorced/separatednone4building society savings agreement/ life insurance24nonerent1skilled employee / official1noneyesbad
    120 <= ... < 200 DM12existing credits paid back duly till nowradio/television1567... < 100 DM1 <= ... < 4 years1male : divorced/separatednone1car or other, not in attribute Savings account/bonds22noneown1skilled employee / official1yes, registered under the customers nameyesgood
    13... < 0 DM24critical account/ other credits existing (not at this bank)car (new)1199... < 100 DM... >= 7 years4male : divorced/separatednone4car or other, not in attribute Savings account/bonds60noneown2unskilled - resident1noneyesbad
    14... < 0 DM15existing credits paid back duly till nowcar (new)1403... < 100 DM1 <= ... < 4 years2male : divorced/separatednone4car or other, not in attribute Savings account/bonds28nonerent1skilled employee / official1noneyesgood
    15... < 0 DM24existing credits paid back duly till nowradio/television1282100 <= ... < 500 DM1 <= ... < 4 years4male : divorced/separatednone2car or other, not in attribute Savings account/bonds32noneown1unskilled - resident1noneyesbad
    16no checking account24critical account/ other credits existing (not at this bank)radio/television2424unknown/ no savings account... >= 7 years4male : divorced/separatednone4building society savings agreement/ life insurance53noneown2skilled employee / official1noneyesgood
    17... < 0 DM30no credits taken/ all credits paid back dulybusiness8072unknown/ no savings account... < 1 year2male : divorced/separatednone3car or other, not in attribute Savings account/bonds25bankown3skilled employee / official1noneyesgood
    180 <= ... < 200 DM24existing credits paid back duly till nowcar (used)12579... < 100 DM... >= 7 years4male : divorced/separatednone2unknown / no property44nonefor free1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesbad
    19no checking account24existing credits paid back duly till nowradio/television3430500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone2car or other, not in attribute Savings account/bonds31noneown1skilled employee / official2yes, registered under the customers nameyesgood

    #1.2、对各个变量进行EDA分析

    # 数值型变量:数据类型、缺失率、唯一值、均值、标准差、分位数等,
    # 分类型变量:数据类型、缺失率、唯一值、top1(占比第一的数据类)等。

    typesizemissinguniquemean_or_top1std_or_top2min_or_top31%_or_top410%_or_top550%_or_bottom575%_or_bottom490%_or_bottom399%_or_bottom2max_or_bottom1
    status.of.existing.checking.accountcategory10000.00%4no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%
    duration.in.monthint6410000.00%3320.90312.058814454691824366072
    credit.historycategory10000.00%5existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%
    purposeobject10000.00%10radio/television:28.00%car (new):23.40%furniture/equipment:18.10%car (used):10.30%business:9.70%education:5.00%repairs:2.20%domestic appliances:1.20%others:1.20%retraining:0.90%
    credit.amountint6410000.00%9213271.2582822.736876250425.839322319.53972.257179.414180.3918424
    savings.account.and.bondscategory10000.00%5... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%
    present.employment.sincecategory10000.00%51 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%1 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%
    installment.rate.in.percentage.of.disposable.incomeint6410000.00%42.9731.11871467411134444
    personal.status.and.sexcategory10000.00%4male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%
    other.debtors.or.guarantorscategory10000.00%3none:90.70%guarantor:5.20%co-applicant:4.10%none:90.70%guarantor:5.20%co-applicant:4.10%
    present.residence.sinceint6410000.00%42.8451.10371789611134444
    propertycategory10000.00%4car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%
    age.in.yearsint6410000.00%5335.54611.3754685719202333425267.0175
    other.installment.planscategory10000.00%3none:81.40%bank:13.90%stores:4.70%none:81.40%bank:13.90%stores:4.70%
    housingcategory10000.00%3own:71.30%rent:17.90%for free:10.80%own:71.30%rent:17.90%for free:10.80%
    number.of.existing.credits.at.this.bankint6410000.00%41.4070.57765446811112234
    jobcategory10000.00%4skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%
    number.of.people.being.liable.to.provide.maintenance.forint6410000.00%21.1550.36208577211111222
    telephonecategory10000.00%2none:59.60%yes, registered under the customers name:40.40%none:59.60%yes, registered under the customers name:40.40%
    foreign.workercategory10000.00%2yes:96.30%no:3.70%yes:96.30%no:3.70%
    creditabilityobject10000.00%2good:70.00%bad:30.00%good:70.00%bad:30.00%

    # 1.3、输出连续型变量的mean、std、min、3种分位数、max

    duration.in.monthcredit.amountinstallment.rate.in.percentage.of.disposable.incomepresent.residence.sinceage.in.yearsnumber.of.existing.credits.at.this.banknumber.of.people.being.liable.to.provide.maintenance.for
    mean20.9033271.2582.9732.84535.5461.4071.155
    std12.058814452822.7368761.1187146741.10371789611.375468570.5776544680.362085772
    min4250111911
    25%121365.5222711
    50%182319.5333311
    75%243972.25444221
    max7218424447542

    # 2、数据预处理

    # 2.1、对类别型目标变量映射成数值型变量

    # 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等

    ivginientropyunique
    creditability12.22649152002
    status.of.existing.checking.account0.6660115030.3680372040.5451963414
    duration.in.month0.3547835740.4067550430.60965916133
    credit.amount0.3514549660.4086798340.610864302921
    credit.history0.2932335470.3940896130.5806307475
    age.in.years0.211196620.414339280.61086320653
    savings.account.and.bonds0.1960095570.404838450.5913766945
    purpose0.1691950660.4059902920.59360941510
    property0.1126382620.4100377880.5990910684
    present.employment.since0.0864336310.4122853250.6017824645
    housing0.0832934340.4123560670.6020244673
    other.installment.plans0.0576145420.4146075410.6047125723
    foreign.worker0.0438774120.4171704410.6068281122
    other.debtors.or.guarantors0.0320193220.4172089460.6075392613
    installment.rate.in.percentage.of.disposable.income0.026322090.4176997470.608111034
    number.of.existing.credits.at.this.bank0.0132665240.4188780970.6094930274
    personal.status.and.sex0.0088399190.4192381710.6099442874
    job0.0087627660.4192082340.6099373174
    telephone0.0063776050.4194414910.6101963442
    present.residence.since0.0035887730.4196852950.6104882694
    number.of.people.being.liable.to.provide.maintenance.for4.34E-050.4199961820.610859752

    # 2.3、筛选特征:分别基于IV、empty、corr指标

    drop_cols: 
     {'empty': array([], dtype=float64), 'iv': array(['personal.status.and.sex', 'present.residence.since',
           'number.of.existing.credits.at.this.bank', 'job',
           'number.of.people.being.liable.to.provide.maintenance.for',
           'telephone'], dtype=object), 'corr': array([], dtype=object)}

    # 2.4、分箱处理

    对数值型变量和分类型变量进行分箱,分箱方法支持卡方chi、决策树、百分位、等频、等距分箱

    1. data_df_s2bins_dict:
    2. {'status.of.existing.checking.account': [['no checking account'], ['... >= 200 DM / salary assignments for at least 1 year'], ['0 <= ... < 200 DM'], ['... < 0 DM']], 'duration.in.month': [9, 12, 13, 16, 36, 45], 'credit.history': [['critical account/ other credits existing (not at this bank)'], ['delay in paying off in the past', 'existing credits paid back duly till now'], ['all credits at this bank paid back duly', 'no credits taken/ all credits paid back duly']], 'purpose': [['retraining', 'car (used)'], ['radio/television'], ['furniture/equipment'], ['domestic appliances', 'business', 'repairs'], ['car (new)'], ['others', 'education']], 'credit.amount': [3556], 'savings.account.and.bonds': [['... >= 1000 DM', '500 <= ... < 1000 DM', 'unknown/ no savings account'], ['100 <= ... < 500 DM'], ['... < 100 DM']], 'present.employment.since': [['4 <= ... < 7 years'], ['... >= 7 years'], ['1 <= ... < 4 years'], ['unemployed'], ['... < 1 year']], 'installment.rate.in.percentage.of.disposable.income': [2, 3, 4], 'other.debtors.or.guarantors': [['guarantor', 'none', 'co-applicant']], 'property': [['real estate'], ['building society savings agreement/ life insurance'], ['car or other, not in attribute Savings account/bonds'], ['unknown / no property']], 'age.in.years': [26, 35, 37, 49], 'other.installment.plans': [['none'], ['stores', 'bank']], 'housing': [['own'], ['rent'], ['for free']], 'foreign.worker': [['no', 'yes']], 'creditability': [['good'], ['bad']]}

    # 2.5、利用badrate图进一步调整分箱

    # 2.5.1、自定义调整分箱示例

    # 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图

     

     # 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势

     # 2.6、对分箱后的数据进行WOE转换

    status.of.existing.checking.accountduration.in.monthcredit.historypurposecredit.amountsavings.account.and.bondspresent.employment.sinceinstallment.rate.in.percentage.of.disposable.incomeother.debtors.or.guarantorspropertyage.in.yearsother.installment.planshousingforeign.workercreditabilitycreditability_map
    00.818098706-1.280933845-0.733740578-0.410062817-0.153492135-0.762140052-0.2355660710.1573002890-0.461034959-0.194156014-0.121178625-0.1941560140-5.7037824750
    10.4013917831.1349799330.087868755-0.4100628170.315638150.2713578440.032103245-0.1554664690-0.4610349590.48083491-0.121178625-0.19415601406.5510803351
    2-1.176263223-0.128416292-0.7337405780.587786665-0.1534921350.271357844-0.394415272-0.1554664690-0.461034959-0.266352306-0.121178625-0.1941560140-5.7037824750
    30.8180987060.5245244680.0878687550.0955565160.315638150.271357844-0.394415272-0.15546646900.028573372-0.266352306-0.1211786250.4726044110-5.7037824750
    40.8180987060.1086883060.0878687550.3592004880.315638150.2713578440.032103245-0.06453852100.586082361-0.266352306-0.1211786250.47260441106.5510803351
    5-1.1762632230.5245244680.0878687550.5877866650.31563815-0.7621400520.032103245-0.15546646900.586082361-0.044353168-0.1211786250.4726044110-5.7037824750
    6-1.1762632230.1086883060.0878687550.095556516-0.153492135-0.762140052-0.235566071-0.06453852100.028573372-0.266352306-0.121178625-0.1941560140-5.7037824750
    70.4013917830.5245244680.087868755-0.8056251640.315638150.2713578440.032103245-0.15546646900.034191365-0.044353168-0.1211786250.404445220-5.7037824750
    8-1.176263223-0.1284162920.087868755-0.410062817-0.153492135-0.762140052-0.394415272-0.1554664690-0.461034959-0.266352306-0.121178625-0.1941560140-5.7037824750
    90.4013917830.108688306-0.7337405780.3592004880.315638150.2713578440.319230430.15730028900.034191365-0.044353168-0.121178625-0.19415601406.5510803351
    100.401391783-0.1284162920.0878687550.359200488-0.1534921350.2713578440.470820289-0.06453852100.034191365-0.044353168-0.1211786250.4044452206.5510803351
    110.8180987061.1349799330.0878687550.2332880.315638150.2713578440.470820289-0.06453852100.0285733720.48083491-0.1211786250.4044452206.5510803351
    120.401391783-0.1284162920.087868755-0.410062817-0.1534921350.2713578440.032103245-0.25131442800.0341913650.48083491-0.121178625-0.1941560140-5.7037824750
    130.8180987060.108688306-0.7337405780.359200488-0.1534921350.271357844-0.2355660710.15730028900.034191365-0.266352306-0.121178625-0.19415601406.5510803351
    140.818098706-0.6652902260.0878687550.359200488-0.1534921350.2713578440.032103245-0.15546646900.034191365-0.044353168-0.1211786250.404445220-5.7037824750
    150.8180987060.1086883060.087868755-0.410062817-0.1534921350.139551880.0321032450.15730028900.034191365-0.044353168-0.121178625-0.19415601406.5510803351
    16-1.1762632230.108688306-0.733740578-0.410062817-0.153492135-0.762140052-0.2355660710.15730028900.028573372-0.266352306-0.121178625-0.1941560140-5.7037824750
    170.8180987060.1086883061.2340708350.2332880.31563815-0.7621400520.470820289-0.15546646900.034191365-0.0443531680.477550835-0.1941560140-5.7037824750
    180.4013917830.1086883060.087868755-0.8056251640.315638150.271357844-0.2355660710.15730028900.586082361-0.044353168-0.1211786250.47260441106.5510803351
    19-1.1762632230.1086883060.087868755-0.410062817-0.153492135-0.762140052-0.235566071-0.06453852100.034191365-0.044353168-0.121178625-0.1941560140-5.7037824750

    # 2.7、特征选择

    # 通过向前、向后、双向选择来进行特征选择,使用aic、bic、ks、auc 作为选择标准

    1. final_data: 
    2.  (1000, 3)
    3. final_data: 
    4.  Index(['status.of.existing.checking.account', 'creditability',
    5.        'creditability_map'],
    6.       dtype='object')

    # 3、模型建立、训练、评估

    # 3.1、切分训练集、测试集

    # 3.2、模型训练

    # 3.3、模型评估:F1、KS、AUC

    # 4、模型上线评估,并计算信用分

    # 4.1、评估变量的稳定性PSI:比较训练集和测试集

    cal PSI 0.012897491574571578

    # 4.2、训练集等频分箱,观测每组的区别

    minmaxbadsgoodstotalbad_rategood_rateoddsbad_propgood_proptotal_propcum_bad_ratecum_bad_rate_revcum_bads_propcum_bads_prop_revcum_goods_propcum_goods_prop_revcum_total_propcum_total_prop_revkslift
    00.0001949760.000204106029229201000.55831740.38933333300.302666667010.558317410.38933333310.55831741
    10.0002141220.000214122012512501000.2390057360.16666666700.495633188010.7973231360.44168260.5560.6106666670.7973231361.637554585
    20.0002194860.000219486010610601000.2026768640.14133333300.6816816820110.2026768640.6973333330.44412.252252252
    30.9994849360.999507974804810inf0.21145374400.0640.08406304710.2114537441100.7613333330.3026666670.7885462563.303964758
    40.999530980.999530987807810inf0.34361233500.1040.19414483810.5550660790.788546256100.8653333330.2386666670.4449339213.303964758
    50.9995424390.999542439101010110inf0.44493392100.1346666670.302666667110.4449339211010.13466666703.303964758

    # 4.3、评分卡分数变换

    namevaluescore
    0status.of.existing.checking.accountno checking account261.94
    1status.of.existing.checking.account... >= 200 DM / salary assignments for at least 1 year258.87
    2status.of.existing.checking.account0 <= ... < 200 DM255.66
    3status.of.existing.checking.account... < 0 DM254.01
    4creditabilitygood744.2
    5creditabilitybad-302.01

  • 相关阅读:
    二维数组的定义和初始化
    抖音矩阵系统,这个排名很难啊。按?
    泰迪云课堂大数据培训平台业务介绍
    Docker创建算法环境
    三菱FX3U小项目—运料小车自动化
    CTFshow,命令执行:web31
    基于QT实现的图书室管理系统
    CTF逆向基础
    numpy中计算相关系数的np.corrcoef
    这么多年 Java 白学了,原来我连个 printf 都不会
  • 原文地址:https://blog.csdn.net/qq_41185868/article/details/125418213