ScoreCard Model using R
Example R Notebook for ScoreCard Model
Have you ever wondered how exactly the decision to approve a loan is done?
Was the bank able to clearly articulate the rationale of their decision to decline?
To ensure fairness and transparency, there are compliance rules which mandates Banks to have such a mechanism. The Federal Trade Commission (FTC), the nation’s(US) consumer protection agency, enforces the Equal Credit Opportunity Act (ECOA), which prohibits credit discrimination on the basis of race, color, religion, national origin, sex, marital status,age, or because you get public assistance.
In BFS, often underwriter needs to decide the creditworthiness of an applicant. Among other things, a scorecard is typically in place and long been used. Now with data deluge and better processing power many Analytical models are also used.
Analytical Models though fairly strong in predictive aspects , sometimes are like black box to a business user and the Bank representative facing the customer may find it hard to clearly explain the reason of output from the model. Here is where scorecard like simplicity helps.To get the best of both worlds in this aspect, scorecard model comes handy.
A scorecard model is one that’s leverage underlying analytical model (In our case, logistic) & create a conventional looking scorecard that helps to determine creditability in a manner that can be easily explained to applicant.
A Scorecard represent a potent and commercially significant formulation of predictive models, used extensively in retail banking to estimate and rank-order consumer credit risk.Frequently, scorecard models are outfitted with reason codes, which provide a clear means to explain the calculated score,and may be offered to a consumer, or might better inform an operational decision under manual review. Usually, the lower the overall score produced by a scorecard, the higher the chances of it triggering an adverse decision, which usually involves the referral or denial of services.
The scorecard package in R makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection,woe binning, scorecard scaling, performance evaluation and report generation.These functions can also be used in the development of machine learning models.
In attached R snippet, standard dataset of German Credit is used, a score card model using Logistic Regression as base is developed and ScoreCard created for use. The model is also exported as PMML.
Hope you find this useful. Feel free to explore more..& Have a great day! Thanks
library(scorecard)
library(pmml)
Loading required package: XML
data("germancredit")
str(germancredit)
'data.frame': 1000 obs. of 21 variables:
$ status.of.existing.checking.account : Factor w/ 4 levels "... < 0 DM","0 <= ... < 200 DM",..: 1 2 4 1 1 4 4 2 4 2 ...
$ duration.in.month : num 6 48 12 42 24 36 24 36 12 30 ...
$ credit.history : Factor w/ 5 levels "no credits taken/ all credits paid back duly",..: 5 3 5 3 4 3 3 3 3 5 ...
$ purpose : chr "radio/television" "radio/television" "education" "furniture/equipment" ...
$ credit.amount : num 1169 5951 2096 7882 4870 ...
$ savings.account.and.bonds : Factor w/ 5 levels "... < 100 DM",..: 5 1 1 1 1 5 3 1 4 1 ...
$ present.employment.since : Factor w/ 5 levels "unemployed","... < 1 year",..: 5 3 4 4 3 3 5 3 4 1 ...
$ installment.rate.in.percentage.of.disposable.income : num 4 2 2 2 3 2 3 2 2 4 ...
$ personal.status.and.sex : Factor w/ 5 levels "male : divorced/separated",..: 3 2 3 3 3 3 3 3 2 4 ...
$ other.debtors.or.guarantors : Factor w/ 3 levels "none","co-applicant",..: 1 1 1 3 1 1 1 1 1 1 ...
$ present.residence.since : num 4 2 3 4 4 4 4 2 4 2 ...
$ property : Factor w/ 4 levels "real estate",..: 1 1 1 2 4 4 2 3 1 3 ...
$ age.in.years : num 67 22 49 45 53 35 53 35 61 28 ...
$ other.installment.plans : Factor w/ 3 levels "bank","stores",..: 3 3 3 3 3 3 3 3 3 3 ...
$ housing : Factor w/ 3 levels "rent","own","for free": 2 2 2 3 3 3 2 1 2 2 ...
$ number.of.existing.credits.at.this.bank : num 2 1 1 1 2 1 1 1 1 2 ...
$ job : Factor w/ 4 levels "unemployed/ unskilled - non-resident",..: 3 3 2 3 3 2 3 4 2 4 ...
$ number.of.people.being.liable.to.provide.maintenance.for: num 1 1 2 2 2 2 1 1 1 1 ...
$ telephone : Factor w/ 2 levels "none","yes, registered under the customers name": 2 1 1 1 1 2 1 2 1 1 ...
$ foreign.worker : Factor w/ 2 levels "yes","no": 1 1 1 1 1 1 1 1 1 1 ...
$ creditability : Factor w/ 2 levels "bad","good": 2 1 2 2 1 2 2 2 2 1 ...
# filter variable via missing rate, iv, identical value rate.
# "Creditability"" is the target variable//Good or Bad Creditability of the customer.."
dt_f = var_filter(germancredit, "creditability")
[INFO] filtering variables ...
# summary of germancredit
lapply(germancredit, summary)
$`status.of.existing.checking.account`
... < 0 DM
274
0 <= ... < 200 DM
269
... >= 200 DM / salary assignments for at least 1 year
63
no checking account
394
$duration.in.month
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.0 12.0 18.0 20.9 24.0 72.0
$credit.history
no credits taken/ all credits paid back duly
40
all credits at this bank paid back duly
49
existing credits paid back duly till now
530
delay in paying off in the past
88
critical account/ other credits existing (not at this bank)
293
$purpose
Length Class Mode
1000 character character
$credit.amount
Min. 1st Qu. Median Mean 3rd Qu. Max.
250 1366 2320 3271 3972 18424
$savings.account.and.bonds
... < 100 DM 100 <= ... < 500 DM 500 <= ... < 1000 DM
603 103 63
... >= 1000 DM unknown/ no savings account
48 183
$present.employment.since
unemployed ... < 1 year 1 <= ... < 4 years 4 <= ... < 7 years
62 172 339 174
... >= 7 years
253
$installment.rate.in.percentage.of.disposable.income
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 2.973 4.000 4.000
$personal.status.and.sex
male : divorced/separated female : divorced/separated/married
0 360
male : single male : married/widowed
548 92
female : single
0
$other.debtors.or.guarantors
none co-applicant guarantor
907 41 52
$present.residence.since
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 2.845 4.000 4.000
$property
real estate
282
building society savings agreement/ life insurance
232
car or other, not in attribute Savings account/bonds
332
unknown / no property
154
$age.in.years
Min. 1st Qu. Median Mean 3rd Qu. Max.
19.00 27.00 33.00 35.55 42.00 75.00
$other.installment.plans
bank stores none
139 47 814
$housing
rent own for free
179 713 108
$number.of.existing.credits.at.this.bank
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 1.000 1.407 2.000 4.000
$job
unemployed/ unskilled - non-resident
22
unskilled - resident
200
skilled employee / official
630
management/ self-employed/ highly qualified employee/ officer
148
$number.of.people.being.liable.to.provide.maintenance.for
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 1.000 1.155 1.000 2.000
$telephone
none yes, registered under the customers name
596 404
$foreign.worker
yes no
963 37
$creditability
bad good
300 700
# breaking dt into train and test
dt_list = split_df(dt_f, "creditability")
label_list = lapply(dt_list, function(x) x$creditability)
woebin generates optimal binning for numerical, factor and categorical variables using methods including tree-like segmentation or chi-square merge. woebin can also customizing breakpoints if the breaks_list was provided. The default woe is defined as ln(Bad_i/Good_i)
# woe binning ------
bins = woebin(dt_list$train, "creditability")
[INFO] creating woe binning ...
already exporting variable(s): dt, xs, y, breaks_list, special_values, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method
# converting train and test into woe values
dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins)) #woebin_ply converts original input data into woe values based on the binning information generated from woebin.
[INFO] converting into woe values ...
already exporting variable(s): dt, bins, xs
[INFO] converting into woe values ...
already exporting variable(s): dt, bins, xs
head(dt_woe_list$train)
creditability status.of.existing.checking.account_woe duration.in.month_woe
1: 0 0.6047362 -1.33612457
2: 0 0.6047362 0.45941562
3: 1 0.6047362 0.03803506
4: 0 -1.2159803 0.45941562
5: 0 -1.2159803 0.03803506
6: 0 -1.2159803 -0.34875597
credit.history_woe purpose_woe credit.amount_woe savings.account.and.bonds_woe
1: -0.77289516 -0.59640452 -0.04711262 -0.6586620
2: 0.07262747 0.03448518 0.24163401 0.2691864
3: 0.24934415 0.36437207 0.94249133 0.2691864
4: 0.07262747 0.68825720 0.24163401 -0.6586620
5: 0.07262747 0.03448518 -0.35231172 -0.6586620
6: 0.07262747 -0.59640452 -0.35231172 -0.6586620
present.employment.since_woe installment.rate.in.percentage.of.disposable.income_woe
1: -0.3757496 0.03402543
2: -0.4681747 -0.04744960
3: 0.1559497 0.03402543
4: 0.1559497 -0.04744960
5: -0.3757496 0.03402543
6: -0.4681747 -0.04744960
personal.status.and.sex_woe other.debtors.or.guarantors_woe property_woe
1: -0.1673360 0.03988814 -0.36383508
2: -0.1673360 -0.83156856 0.03147766
3: -0.1673360 0.03988814 0.42956266
4: -0.1673360 0.03988814 0.42956266
5: -0.1673360 0.03988814 0.03147766
6: 0.2608132 0.03988814 -0.36383508
age.in.years_woe other.installment.plans_woe housing_woe
1: -0.73312849 -0.09530775 -0.1698975
2: -0.08154573 -0.09530775 0.2976807
3: -0.08154573 -0.09530775 0.2976807
4: -0.97466940 -0.09530775 0.2976807
5: -0.08154573 -0.09530775 -0.1698975
6: -0.73312849 -0.09530775 -0.1698975
We use Logisitic Regression model as the base of our scorecard model
m1 = glm(creditability ~ ., family = binomial(), data = dt_woe_list$train)
summary(m1)
Call:
glm(formula = creditability ~ ., family = binomial(), data = dt_woe_list$train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2991 -0.6879 -0.3718 0.7266 2.6172
Coefficients:
Estimate Std. Error z value
(Intercept) -0.8546 0.1034 -8.266
status.of.existing.checking.account_woe 0.8030 0.1334 6.019
duration.in.month_woe 0.7850 0.2139 3.670
credit.history_woe 0.6215 0.1876 3.313
purpose_woe 0.9935 0.2282 4.354
credit.amount_woe 0.8293 0.2416 3.432
savings.account.and.bonds_woe 0.8492 0.2738 3.102
present.employment.since_woe 0.6137 0.2969 2.067
installment.rate.in.percentage.of.disposable.income_woe 5.0813 2.3277 2.183
personal.status.and.sex_woe 1.0782 0.5106 2.112
other.debtors.or.guarantors_woe 1.1449 0.5737 1.996
property_woe 0.1011 0.4330 0.234
age.in.years_woe 0.8533 0.2512 3.397
other.installment.plans_woe 0.7473 0.5329 1.402
housing_woe 0.3925 0.4090 0.960
Pr(>|z|)
(Intercept) < 2e-16 ***
status.of.existing.checking.account_woe 1.75e-09 ***
duration.in.month_woe 0.000242 ***
credit.history_woe 0.000923 ***
purpose_woe 1.34e-05 ***
credit.amount_woe 0.000599 ***
savings.account.and.bonds_woe 0.001924 **
present.employment.since_woe 0.038731 *
installment.rate.in.percentage.of.disposable.income_woe 0.029038 *
personal.status.and.sex_woe 0.034694 *
other.debtors.or.guarantors_woe 0.045982 *
property_woe 0.815366
age.in.years_woe 0.000681 ***
other.installment.plans_woe 0.160818
housing_woe 0.337167
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 833.18 on 680 degrees of freedom
Residual deviance: 611.92 on 666 degrees of freedom
AIC: 641.92
Number of Fisher Scoring iterations: 5
scorecard creates a scorecard based on the results from woebin and glm.
card = scorecard(bins, m1)
#card
Let’s look how score(points) for housing attribute of score card.Points are for each bin of the attribute value.
card$housing
variable bin count count_distr good bad badprob woe bin_iv
1: housing rent 118 0.1732746 71 47 0.3983051 0.4298756 0.03443674
2: housing own 484 0.7107195 355 129 0.2665289 -0.1698975 0.01979816
3: housing for free 79 0.1160059 50 29 0.3670886 0.2976807 0.01084195
total_iv breaks is_special_values points
1: 0.06507685 rent FALSE -12
2: 0.06507685 own FALSE 5
3: 0.06507685 for free FALSE -8
saveXML(pmml(m1,card),"D:\\ScoreCardusingGLM.xml")
[1] "D:\\ScoreCardusingGLMItr.xml"
pmml(m1)
<PMML version="4.3" xmlns="http://www.dmg.org/PMML-4_3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_3 http://www.dmg.org/pmml/v4-3/pmml-4-3.xsd">
<Header copyright="Copyright (c) 2019 143696" description="Generalized Linear Regression Model">
<Extension name="user" value="143696" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2019-02-19 13:58:08</Timestamp>
</Header>
<DataDictionary numberOfFields="15">
<DataField name="creditability" optype="continuous" dataType="double"/>
<DataField name="status.of.existing.checking.account_woe" optype="continuous" dataType="double"/>
<DataField name="duration.in.month_woe" optype="continuous" dataType="double"/>
<DataField name="credit.history_woe" optype="continuous" dataType="double"/>
<DataField name="purpose_woe" optype="continuous" dataType="double"/>
<DataField name="credit.amount_woe" optype="continuous" dataType="double"/>
<DataField name="savings.account.and.bonds_woe" optype="continuous" dataType="double"/>
<DataField name="present.employment.since_woe" optype="continuous" dataType="double"/>
<DataField name="installment.rate.in.percentage.of.disposable.income_woe" optype="continuous" dataType="double"/>
<DataField name="personal.status.and.sex_woe" optype="continuous" dataType="double"/>
<DataField name="other.debtors.or.guarantors_woe" optype="continuous" dataType="double"/>
<DataField name="property_woe" optype="continuous" dataType="double"/>
<DataField name="age.in.years_woe" optype="continuous" dataType="double"/>
<DataField name="other.installment.plans_woe" optype="continuous" dataType="double"/>
<DataField name="housing_woe" optype="continuous" dataType="double"/>
</DataDictionary>
<GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="regression" algorithmName="glm" distribution="binomial" linkFunction="logit">
<MiningSchema>
<MiningField name="creditability" usageType="predicted" invalidValueTreatment="returnInvalid"/>
<MiningField name="status.of.existing.checking.account_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="duration.in.month_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="credit.history_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="purpose_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="credit.amount_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="savings.account.and.bonds_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="present.employment.since_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="installment.rate.in.percentage.of.disposable.income_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="personal.status.and.sex_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="other.debtors.or.guarantors_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="property_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="age.in.years_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="other.installment.plans_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
<MiningField name="housing_woe" usageType="active" invalidValueTreatment="returnInvalid"/>
</MiningSchema>
<Output>
<OutputField name="Predicted_creditability" feature="predictedValue"/>
</Output>
<ParameterList>
<Parameter name="p0" label="(Intercept)"/>
<Parameter name="p1" label="status.of.existing.checking.account_woe"/>
<Parameter name="p2" label="duration.in.month_woe"/>
<Parameter name="p3" label="credit.history_woe"/>
<Parameter name="p4" label="purpose_woe"/>
<Parameter name="p5" label="credit.amount_woe"/>
<Parameter name="p6" label="savings.account.and.bonds_woe"/>
<Parameter name="p7" label="present.employment.since_woe"/>
<Parameter name="p8" label="installment.rate.in.percentage.of.disposable.income_woe"/>
<Parameter name="p9" label="personal.status.and.sex_woe"/>
<Parameter name="p10" label="other.debtors.or.guarantors_woe"/>
<Parameter name="p11" label="property_woe"/>
<Parameter name="p12" label="age.in.years_woe"/>
<Parameter name="p13" label="other.installment.plans_woe"/>
<Parameter name="p14" label="housing_woe"/>
</ParameterList>
<FactorList/>
<CovariateList>
<Predictor name="status.of.existing.checking.account_woe"/>
<Predictor name="duration.in.month_woe"/>
<Predictor name="credit.history_woe"/>
<Predictor name="purpose_woe"/>
<Predictor name="credit.amount_woe"/>
<Predictor name="savings.account.and.bonds_woe"/>
<Predictor name="present.employment.since_woe"/>
<Predictor name="installment.rate.in.percentage.of.disposable.income_woe"/>
<Predictor name="personal.status.and.sex_woe"/>
<Predictor name="other.debtors.or.guarantors_woe"/>
<Predictor name="property_woe"/>
<Predictor name="age.in.years_woe"/>
<Predictor name="other.installment.plans_woe"/>
<Predictor name="housing_woe"/>
</CovariateList>
<PPMatrix>
<PPCell value="1" predictorName="status.of.existing.checking.account_woe" parameterName="p1"/>
<PPCell value="1" predictorName="duration.in.month_woe" parameterName="p2"/>
<PPCell value="1" predictorName="credit.history_woe" parameterName="p3"/>
<PPCell value="1" predictorName="purpose_woe" parameterName="p4"/>
<PPCell value="1" predictorName="credit.amount_woe" parameterName="p5"/>
<PPCell value="1" predictorName="savings.account.and.bonds_woe" parameterName="p6"/>
<PPCell value="1" predictorName="present.employment.since_woe" parameterName="p7"/>
<PPCell value="1" predictorName="installment.rate.in.percentage.of.disposable.income_woe" parameterName="p8"/>
<PPCell value="1" predictorName="personal.status.and.sex_woe" parameterName="p9"/>
<PPCell value="1" predictorName="other.debtors.or.guarantors_woe" parameterName="p10"/>
<PPCell value="1" predictorName="property_woe" parameterName="p11"/>
<PPCell value="1" predictorName="age.in.years_woe" parameterName="p12"/>
<PPCell value="1" predictorName="other.installment.plans_woe" parameterName="p13"/>
<PPCell value="1" predictorName="housing_woe" parameterName="p14"/>
</PPMatrix>
<ParamMatrix>
<PCell parameterName="p0" df="1" beta="-0.854553078468998"/>
<PCell parameterName="p1" df="1" beta="0.80299427276989"/>
<PCell parameterName="p2" df="1" beta="0.785010463666152"/>
<PCell parameterName="p3" df="1" beta="0.621469453939476"/>
<PCell parameterName="p4" df="1" beta="0.993540391123836"/>
<PCell parameterName="p5" df="1" beta="0.829276035447315"/>
<PCell parameterName="p6" df="1" beta="0.849249460118593"/>
<PCell parameterName="p7" df="1" beta="0.61367935345907"/>
<PCell parameterName="p8" df="1" beta="5.08133557959595"/>
<PCell parameterName="p9" df="1" beta="1.07824246187046"/>
<PCell parameterName="p10" df="1" beta="1.14486037605188"/>
<PCell parameterName="p11" df="1" beta="0.101118641189124"/>
<PCell parameterName="p12" df="1" beta="0.853315960205122"/>
<PCell parameterName="p13" df="1" beta="0.747258559988029"/>
<PCell parameterName="p14" df="1" beta="0.392504480027512"/>
</ParamMatrix>
</GeneralRegressionModel>
</PMML>
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
Comments
Post a Comment