ScoreCard Model using R

One often receives mailers on pre-approved loan based on creditworthiness. When one applies for credit, often it’s approved, sometimes it isn’t...

Have you ever wondered how exactly the decision to approve a loan is done?

Was the bank able to clearly articulate the rationale of their decision to decline?

To ensure fairness and transparency, there are compliance rules which mandates Banks to have such a mechanism. The Federal Trade Commission (FTC), the nation’s(US) consumer protection agency, enforces the Equal Credit Opportunity Act (ECOA), which prohibits credit discrimination on the basis of race, color, religion, national origin, sex, marital status,age, or because you get public assistance.

In BFS, often underwriter needs to decide the creditworthiness of an applicant. Among other things, a scorecard is typically in place and long been used. Now with data deluge and better processing power many Analytical models are also used.

Analytical Models though fairly strong in predictive aspects , sometimes are like black box to a business user and the Bank representative facing the customer may find it hard to clearly explain the reason of output from the model. Here is where scorecard like simplicity helps.To get the best of both worlds in this aspect, scorecard model comes handy.

A scorecard model is one that’s leverage underlying analytical model (In our case, logistic) & create a conventional looking scorecard that helps to determine creditability in a manner that can be easily explained to applicant.

A Scorecard represent a potent and commercially significant formulation of predictive models, used extensively in retail banking to estimate and rank-order consumer credit risk.Frequently, scorecard models are outfitted with reason codes, which provide a clear means to explain the calculated score,and may be offered to a consumer, or might better inform an operational decision under manual review. Usually, the lower the overall score produced by a scorecard, the higher the chances of it triggering an adverse decision, which usually involves the referral or denial of services.

The scorecard package in R makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection,woe binning, scorecard scaling, performance evaluation and report generation.These functions can also be used in the development of machine learning models.

In attached R snippet, standard dataset of German Credit is used, a score card model using Logistic Regression as base is developed and ScoreCard created for use. The model is also exported as PMML.

Hope you find this useful. Feel free to explore more..& Have a great day! Thanks

library(scorecard)
library(pmml)

Loading required package: XML

data("germancredit")
str(germancredit)

'data.frame':   1000 obs. of  21 variables:
 $ status.of.existing.checking.account                     : Factor w/ 4 levels "... < 0 DM","0 <= ... < 200 DM",..: 1 2 4 1 1 4 4 2 4 2 ...
 $ duration.in.month                                       : num  6 48 12 42 24 36 24 36 12 30 ...
 $ credit.history                                          : Factor w/ 5 levels "no credits taken/ all credits paid back duly",..: 5 3 5 3 4 3 3 3 3 5 ...
 $ purpose                                                 : chr  "radio/television" "radio/television" "education" "furniture/equipment" ...
 $ credit.amount                                           : num  1169 5951 2096 7882 4870 ...
 $ savings.account.and.bonds                               : Factor w/ 5 levels "... < 100 DM",..: 5 1 1 1 1 5 3 1 4 1 ...
 $ present.employment.since                                : Factor w/ 5 levels "unemployed","... < 1 year",..: 5 3 4 4 3 3 5 3 4 1 ...
 $ installment.rate.in.percentage.of.disposable.income     : num  4 2 2 2 3 2 3 2 2 4 ...
 $ personal.status.and.sex                                 : Factor w/ 5 levels "male : divorced/separated",..: 3 2 3 3 3 3 3 3 2 4 ...
 $ other.debtors.or.guarantors                             : Factor w/ 3 levels "none","co-applicant",..: 1 1 1 3 1 1 1 1 1 1 ...
 $ present.residence.since                                 : num  4 2 3 4 4 4 4 2 4 2 ...
 $ property                                                : Factor w/ 4 levels "real estate",..: 1 1 1 2 4 4 2 3 1 3 ...
 $ age.in.years                                            : num  67 22 49 45 53 35 53 35 61 28 ...
 $ other.installment.plans                                 : Factor w/ 3 levels "bank","stores",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ housing                                                 : Factor w/ 3 levels "rent","own","for free": 2 2 2 3 3 3 2 1 2 2 ...
 $ number.of.existing.credits.at.this.bank                 : num  2 1 1 1 2 1 1 1 1 2 ...
 $ job                                                     : Factor w/ 4 levels "unemployed/ unskilled - non-resident",..: 3 3 2 3 3 2 3 4 2 4 ...
 $ number.of.people.being.liable.to.provide.maintenance.for: num  1 1 2 2 2 2 1 1 1 1 ...
 $ telephone                                               : Factor w/ 2 levels "none","yes, registered under the customers name": 2 1 1 1 1 2 1 2 1 1 ...
 $ foreign.worker                                          : Factor w/ 2 levels "yes","no": 1 1 1 1 1 1 1 1 1 1 ...
 $ creditability                                           : Factor w/ 2 levels "bad","good": 2 1 2 2 1 2 2 2 2 1 ...

# filter variable via missing rate, iv, identical value rate. 
# "Creditability"" is the target variable//Good or Bad Creditability of the customer.."
dt_f = var_filter(germancredit, "creditability")

[INFO] filtering variables ...

# summary of germancredit
lapply(germancredit, summary)

$`status.of.existing.checking.account`
                                            ... < 0 DM 
                                                   274 
                                     0 <= ... < 200 DM 
                                                   269 
... >= 200 DM / salary assignments for at least 1 year 
                                                    63 
                                   no checking account 
                                                   394 

$duration.in.month
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    4.0    12.0    18.0    20.9    24.0    72.0 

$credit.history
               no credits taken/ all credits paid back duly 
                                                         40 
                    all credits at this bank paid back duly 
                                                         49 
                   existing credits paid back duly till now 
                                                        530 
                            delay in paying off in the past 
                                                         88 
critical account/ other credits existing (not at this bank) 
                                                        293 

$purpose
   Length     Class      Mode 
     1000 character character 

$credit.amount
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    250    1366    2320    3271    3972   18424 

$savings.account.and.bonds
               ... < 100 DM         100 <= ... < 500 DM        500 <= ... < 1000 DM 
                        603                         103                          63 
             ... >= 1000 DM unknown/ no savings account 
                         48                         183 

$present.employment.since
        unemployed       ... < 1 year 1 <= ... < 4 years 4 <= ... < 7 years 
                62                172                339                174 
    ... >= 7 years 
               253 

$installment.rate.in.percentage.of.disposable.income
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   2.973   4.000   4.000 

$personal.status.and.sex
          male : divorced/separated female : divorced/separated/married 
                                  0                                 360 
                      male : single              male : married/widowed 
                                548                                  92 
                    female : single 
                                  0 

$other.debtors.or.guarantors
        none co-applicant    guarantor 
         907           41           52 

$present.residence.since
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   2.845   4.000   4.000 

$property
                                         real estate 
                                                 282 
  building society savings agreement/ life insurance 
                                                 232 
car or other, not in attribute Savings account/bonds 
                                                 332 
                               unknown / no property 
                                                 154 

$age.in.years
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  19.00   27.00   33.00   35.55   42.00   75.00 

$other.installment.plans
  bank stores   none 
   139     47    814 

$housing
    rent      own for free 
     179      713      108 

$number.of.existing.credits.at.this.bank
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   1.407   2.000   4.000 

$job
                         unemployed/ unskilled - non-resident 
                                                           22 
                                         unskilled - resident 
                                                          200 
                                  skilled employee / official 
                                                          630 
management/ self-employed/ highly qualified employee/ officer 
                                                          148 

$number.of.people.being.liable.to.provide.maintenance.for
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   1.155   1.000   2.000 

$telephone
                                    none yes, registered under the customers name 
                                     596                                      404 

$foreign.worker
yes  no 
963  37 

$creditability
 bad good 
 300  700

# breaking dt into train and test
dt_list = split_df(dt_f, "creditability")
label_list = lapply(dt_list, function(x) x$creditability)

woebin generates optimal binning for numerical, factor and categorical variables using methods including tree-like segmentation or chi-square merge. woebin can also customizing breakpoints if the breaks_list was provided. The default woe is defined as ln(Bad_i/Good_i)

# woe binning ------
bins = woebin(dt_list$train, "creditability")

[INFO] creating woe binning ...

already exporting variable(s): dt, xs, y, breaks_list, special_values, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method

# converting train and test into woe values
dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins)) #woebin_ply converts original input data into woe values based on the binning information generated from woebin.

[INFO] converting into woe values ...

already exporting variable(s): dt, bins, xs

[INFO] converting into woe values ...

already exporting variable(s): dt, bins, xs

head(dt_woe_list$train)

   creditability status.of.existing.checking.account_woe duration.in.month_woe
1:             0                               0.6047362           -1.33612457
2:             0                               0.6047362            0.45941562
3:             1                               0.6047362            0.03803506
4:             0                              -1.2159803            0.45941562
5:             0                              -1.2159803            0.03803506
6:             0                              -1.2159803           -0.34875597
   credit.history_woe purpose_woe credit.amount_woe savings.account.and.bonds_woe
1:        -0.77289516 -0.59640452       -0.04711262                    -0.6586620
2:         0.07262747  0.03448518        0.24163401                     0.2691864
3:         0.24934415  0.36437207        0.94249133                     0.2691864
4:         0.07262747  0.68825720        0.24163401                    -0.6586620
5:         0.07262747  0.03448518       -0.35231172                    -0.6586620
6:         0.07262747 -0.59640452       -0.35231172                    -0.6586620
   present.employment.since_woe installment.rate.in.percentage.of.disposable.income_woe
1:                   -0.3757496                                              0.03402543
2:                   -0.4681747                                             -0.04744960
3:                    0.1559497                                              0.03402543
4:                    0.1559497                                             -0.04744960
5:                   -0.3757496                                              0.03402543
6:                   -0.4681747                                             -0.04744960
   personal.status.and.sex_woe other.debtors.or.guarantors_woe property_woe
1:                  -0.1673360                      0.03988814  -0.36383508
2:                  -0.1673360                     -0.83156856   0.03147766
3:                  -0.1673360                      0.03988814   0.42956266
4:                  -0.1673360                      0.03988814   0.42956266
5:                  -0.1673360                      0.03988814   0.03147766
6:                   0.2608132                      0.03988814  -0.36383508
   age.in.years_woe other.installment.plans_woe housing_woe
1:      -0.73312849                 -0.09530775  -0.1698975
2:      -0.08154573                 -0.09530775   0.2976807
3:      -0.08154573                 -0.09530775   0.2976807
4:      -0.97466940                 -0.09530775   0.2976807
5:      -0.08154573                 -0.09530775  -0.1698975
6:      -0.73312849                 -0.09530775  -0.1698975

We use Logisitic Regression model as the base of our scorecard model

m1 = glm(creditability ~ ., family = binomial(), data = dt_woe_list$train)
summary(m1)


Call:
glm(formula = creditability ~ ., family = binomial(), data = dt_woe_list$train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2991  -0.6879  -0.3718   0.7266   2.6172  

Coefficients:
                                                        Estimate Std. Error z value
(Intercept)                                              -0.8546     0.1034  -8.266
status.of.existing.checking.account_woe                   0.8030     0.1334   6.019
duration.in.month_woe                                     0.7850     0.2139   3.670
credit.history_woe                                        0.6215     0.1876   3.313
purpose_woe                                               0.9935     0.2282   4.354
credit.amount_woe                                         0.8293     0.2416   3.432
savings.account.and.bonds_woe                             0.8492     0.2738   3.102
present.employment.since_woe                              0.6137     0.2969   2.067
installment.rate.in.percentage.of.disposable.income_woe   5.0813     2.3277   2.183
personal.status.and.sex_woe                               1.0782     0.5106   2.112
other.debtors.or.guarantors_woe                           1.1449     0.5737   1.996
property_woe                                              0.1011     0.4330   0.234
age.in.years_woe                                          0.8533     0.2512   3.397
other.installment.plans_woe                               0.7473     0.5329   1.402
housing_woe                                               0.3925     0.4090   0.960
                                                        Pr(>|z|)    
(Intercept)                                              < 2e-16 ***
status.of.existing.checking.account_woe                 1.75e-09 ***
duration.in.month_woe                                   0.000242 ***
credit.history_woe                                      0.000923 ***
purpose_woe                                             1.34e-05 ***
credit.amount_woe                                       0.000599 ***
savings.account.and.bonds_woe                           0.001924 ** 
present.employment.since_woe                            0.038731 *  
installment.rate.in.percentage.of.disposable.income_woe 0.029038 *  
personal.status.and.sex_woe                             0.034694 *  
other.debtors.or.guarantors_woe                         0.045982 *  
property_woe                                            0.815366    
age.in.years_woe                                        0.000681 ***
other.installment.plans_woe                             0.160818    
housing_woe                                             0.337167    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 833.18  on 680  degrees of freedom
Residual deviance: 611.92  on 666  degrees of freedom
AIC: 641.92

Number of Fisher Scoring iterations: 5

scorecard creates a scorecard based on the results from woebin and glm.

card = scorecard(bins, m1)
#card

Let’s look how score(points) for housing attribute of score card.Points are for each bin of the attribute value.

card$housing

   variable      bin count count_distr good bad   badprob        woe     bin_iv
1:  housing     rent   118   0.1732746   71  47 0.3983051  0.4298756 0.03443674
2:  housing      own   484   0.7107195  355 129 0.2665289 -0.1698975 0.01979816
3:  housing for free    79   0.1160059   50  29 0.3670886  0.2976807 0.01084195
     total_iv   breaks is_special_values points
1: 0.06507685     rent             FALSE    -12
2: 0.06507685      own             FALSE      5
3: 0.06507685 for free             FALSE     -8

saveXML(pmml(m1,card),"D:\\ScoreCardusingGLM.xml")

[1] "D:\\ScoreCardusingGLMItr.xml"

pmml(m1)

<PMML version="4.3" xmlns="http://www.dmg.org/PMML-4_3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_3 http://www.dmg.org/pmml/v4-3/pmml-4-3.xsd">
 <Header copyright="Copyright (c) 2019 143696" description="Generalized Linear Regression Model">
  <Extension name="user" value="143696" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2019-02-19 13:58:08</Timestamp>
 </Header>
 <DataDictionary numberOfFields="15">
  <DataField name="creditability" optype="continuous" dataType="double"/>
  <DataField name="status.of.existing.checking.account_woe" optype="continuous" dataType="double"/>
  <DataField name="duration.in.month_woe" optype="continuous" dataType="double"/>
  <DataField name="credit.history_woe" optype="continuous" dataType="double"/>
  <DataField name="purpose_woe" optype="continuous" dataType="double"/>
  <DataField name="credit.amount_woe" optype="continuous" dataType="double"/>
  <DataField name="savings.account.and.bonds_woe" optype="continuous" dataType="double"/>
  <DataField name="present.employment.since_woe" optype="continuous" dataType="double"/>
  <DataField name="installment.rate.in.percentage.of.disposable.income_woe" optype="continuous" dataType="double"/>
  <DataField name="personal.status.and.sex_woe" optype="continuous" dataType="double"/>
  <DataField name="other.debtors.or.guarantors_woe" optype="continuous" dataType="double"/>
  <DataField name="property_woe" optype="continuous" dataType="double"/>
  <DataField name="age.in.years_woe" optype="continuous" dataType="double"/>
  <DataField name="other.installment.plans_woe" optype="continuous" dataType="double"/>
  <DataField name="housing_woe" optype="continuous" dataType="double"/>
 </DataDictionary>
 <GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="regression" algorithmName="glm" distribution="binomial" linkFunction="logit">
  <MiningSchema>
   <MiningField name="creditability" usageType="predicted" invalidValueTreatment="returnInvalid"/>
   <MiningField name="status.of.existing.checking.account_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="duration.in.month_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="credit.history_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="purpose_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="credit.amount_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="savings.account.and.bonds_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="present.employment.since_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="installment.rate.in.percentage.of.disposable.income_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="personal.status.and.sex_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="other.debtors.or.guarantors_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="property_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="age.in.years_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="other.installment.plans_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="housing_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
  </MiningSchema>
  <Output>
   <OutputField name="Predicted_creditability" feature="predictedValue"/>
  </Output>
  <ParameterList>
   <Parameter name="p0" label="(Intercept)"/>
   <Parameter name="p1" label="status.of.existing.checking.account_woe"/>
   <Parameter name="p2" label="duration.in.month_woe"/>
   <Parameter name="p3" label="credit.history_woe"/>
   <Parameter name="p4" label="purpose_woe"/>
   <Parameter name="p5" label="credit.amount_woe"/>
   <Parameter name="p6" label="savings.account.and.bonds_woe"/>
   <Parameter name="p7" label="present.employment.since_woe"/>
   <Parameter name="p8" label="installment.rate.in.percentage.of.disposable.income_woe"/>
   <Parameter name="p9" label="personal.status.and.sex_woe"/>
   <Parameter name="p10" label="other.debtors.or.guarantors_woe"/>
   <Parameter name="p11" label="property_woe"/>
   <Parameter name="p12" label="age.in.years_woe"/>
   <Parameter name="p13" label="other.installment.plans_woe"/>
   <Parameter name="p14" label="housing_woe"/>
  </ParameterList>
  <FactorList/>
  <CovariateList>
   <Predictor name="status.of.existing.checking.account_woe"/>
   <Predictor name="duration.in.month_woe"/>
   <Predictor name="credit.history_woe"/>
   <Predictor name="purpose_woe"/>
   <Predictor name="credit.amount_woe"/>
   <Predictor name="savings.account.and.bonds_woe"/>
   <Predictor name="present.employment.since_woe"/>
   <Predictor name="installment.rate.in.percentage.of.disposable.income_woe"/>
   <Predictor name="personal.status.and.sex_woe"/>
   <Predictor name="other.debtors.or.guarantors_woe"/>
   <Predictor name="property_woe"/>
   <Predictor name="age.in.years_woe"/>
   <Predictor name="other.installment.plans_woe"/>
   <Predictor name="housing_woe"/>
  </CovariateList>
  <PPMatrix>
   <PPCell value="1" predictorName="status.of.existing.checking.account_woe" parameterName="p1"/>
   <PPCell value="1" predictorName="duration.in.month_woe" parameterName="p2"/>
   <PPCell value="1" predictorName="credit.history_woe" parameterName="p3"/>
   <PPCell value="1" predictorName="purpose_woe" parameterName="p4"/>
   <PPCell value="1" predictorName="credit.amount_woe" parameterName="p5"/>
   <PPCell value="1" predictorName="savings.account.and.bonds_woe" parameterName="p6"/>
   <PPCell value="1" predictorName="present.employment.since_woe" parameterName="p7"/>
   <PPCell value="1" predictorName="installment.rate.in.percentage.of.disposable.income_woe" parameterName="p8"/>
   <PPCell value="1" predictorName="personal.status.and.sex_woe" parameterName="p9"/>
   <PPCell value="1" predictorName="other.debtors.or.guarantors_woe" parameterName="p10"/>
   <PPCell value="1" predictorName="property_woe" parameterName="p11"/>
   <PPCell value="1" predictorName="age.in.years_woe" parameterName="p12"/>
   <PPCell value="1" predictorName="other.installment.plans_woe" parameterName="p13"/>
   <PPCell value="1" predictorName="housing_woe" parameterName="p14"/>
  </PPMatrix>
  <ParamMatrix>
   <PCell parameterName="p0" df="1" beta="-0.854553078468998"/>
   <PCell parameterName="p1" df="1" beta="0.80299427276989"/>
   <PCell parameterName="p2" df="1" beta="0.785010463666152"/>
   <PCell parameterName="p3" df="1" beta="0.621469453939476"/>
   <PCell parameterName="p4" df="1" beta="0.993540391123836"/>
   <PCell parameterName="p5" df="1" beta="0.829276035447315"/>
   <PCell parameterName="p6" df="1" beta="0.849249460118593"/>
   <PCell parameterName="p7" df="1" beta="0.61367935345907"/>
   <PCell parameterName="p8" df="1" beta="5.08133557959595"/>
   <PCell parameterName="p9" df="1" beta="1.07824246187046"/>
   <PCell parameterName="p10" df="1" beta="1.14486037605188"/>
   <PCell parameterName="p11" df="1" beta="0.101118641189124"/>
   <PCell parameterName="p12" df="1" beta="0.853315960205122"/>
   <PCell parameterName="p13" df="1" beta="0.747258559988029"/>
   <PCell parameterName="p14" df="1" beta="0.392504480027512"/>
  </ParamMatrix>
 </GeneralRegressionModel>
</PMML>

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

Search This Blog

TechMusings (BigData,Hadoop,Pig,Hive,DataScience,IoT,EAI,SOA,J2EE)

ScoreCard Model using R

Example R Notebook for ScoreCard Model

Comments

Post a Comment

Popular posts from this blog

The auxService:mapreduce_shuffle does not exist

Zeppelin and Anaconda

Logistic Regression using German Credit Data