library(pomodoro)
This package was set for the Credit Access studies. But it can be used for the binary and multiple factor variables. First thing let’s see the str of the sample_data with str(sample_data)
. Since the dataset is huge, let’s take the first 500 rows and set the study on it.
The following example run the multinominal logistic model
in yvar
. The function simplifies the 80/20 train test set using 10cv after scaled and center it.
#> Loading required package: ggplot2
#> Loading required package: lattice
#> + Fold01: mtry= 2
#> - Fold01: mtry= 2
#> + Fold01: mtry= 6
#> - Fold01: mtry= 6
#> + Fold01: mtry=10
#> - Fold01: mtry=10
#> + Fold02: mtry= 2
#> - Fold02: mtry= 2
#> + Fold02: mtry= 6
#> - Fold02: mtry= 6
#> + Fold02: mtry=10
#> - Fold02: mtry=10
#> + Fold03: mtry= 2
#> - Fold03: mtry= 2
#> + Fold03: mtry= 6
#> - Fold03: mtry= 6
#> + Fold03: mtry=10
#> - Fold03: mtry=10
#> + Fold04: mtry= 2
#> - Fold04: mtry= 2
#> + Fold04: mtry= 6
#> - Fold04: mtry= 6
#> + Fold04: mtry=10
#> - Fold04: mtry=10
#> + Fold05: mtry= 2
#> - Fold05: mtry= 2
#> + Fold05: mtry= 6
#> - Fold05: mtry= 6
#> + Fold05: mtry=10
#> - Fold05: mtry=10
#> + Fold06: mtry= 2
#> - Fold06: mtry= 2
#> + Fold06: mtry= 6
#> - Fold06: mtry= 6
#> + Fold06: mtry=10
#> - Fold06: mtry=10
#> + Fold07: mtry= 2
#> - Fold07: mtry= 2
#> + Fold07: mtry= 6
#> - Fold07: mtry= 6
#> + Fold07: mtry=10
#> - Fold07: mtry=10
#> + Fold08: mtry= 2
#> - Fold08: mtry= 2
#> + Fold08: mtry= 6
#> - Fold08: mtry= 6
#> + Fold08: mtry=10
#> - Fold08: mtry=10
#> + Fold09: mtry= 2
#> - Fold09: mtry= 2
#> + Fold09: mtry= 6
#> - Fold09: mtry= 6
#> + Fold09: mtry=10
#> - Fold09: mtry=10
#> + Fold10: mtry= 2
#> - Fold10: mtry= 2
#> + Fold10: mtry= 6
#> - Fold10: mtry= 6
#> + Fold10: mtry=10
#> - Fold10: mtry=10
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#>
#> Call:
#> randomForest(formula = Data.sub.train[, yvar] ~ ., data = Data.sub.train)
#> Type of random forest: classification
#> Number of trees: 500
#> No. of variables tried at each split: 2
#>
#> OOB estimate of error rate: 23.44%
#> Confusion matrix:
#> No.Loan Formal Informal L.Both class.error
#> No.Loan 282 7 0 0 0.02422145
#> Formal 31 25 0 0 0.55357143
#> Informal 38 0 0 0 1.00000000
#> L.Both 14 4 0 0 1.00000000
#> Multi-class area under the curve: 0.7583
Estimate_Models function considers exog
and xadd
variables and set multiple models based on the selected exog
and xadd
. On the one hand exog
is subtract the selected vector from the dataset and run the model for all the dataset and for the splits of the exog
. On the other hand xadd
add the selected vectors and run the model. Where the dnames
are the unique values in exog
this is to save the model estimates by their name.
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> + Fold01: mtry=2
#> - Fold01: mtry=2
#> + Fold01: mtry=5
#> - Fold01: mtry=5
#> + Fold01: mtry=9
#> - Fold01: mtry=9
#> + Fold02: mtry=2
#> - Fold02: mtry=2
#> + Fold02: mtry=5
#> - Fold02: mtry=5
#> + Fold02: mtry=9
#> - Fold02: mtry=9
#> + Fold03: mtry=2
#> - Fold03: mtry=2
#> + Fold03: mtry=5
#> - Fold03: mtry=5
#> + Fold03: mtry=9
#> - Fold03: mtry=9
#> + Fold04: mtry=2
#> - Fold04: mtry=2
#> + Fold04: mtry=5
#> - Fold04: mtry=5
#> + Fold04: mtry=9
#> - Fold04: mtry=9
#> + Fold05: mtry=2
#> - Fold05: mtry=2
#> + Fold05: mtry=5
#> - Fold05: mtry=5
#> + Fold05: mtry=9
#> - Fold05: mtry=9
#> + Fold06: mtry=2
#> - Fold06: mtry=2
#> + Fold06: mtry=5
#> - Fold06: mtry=5
#> + Fold06: mtry=9
#> - Fold06: mtry=9
#> + Fold07: mtry=2
#> - Fold07: mtry=2
#> + Fold07: mtry=5
#> - Fold07: mtry=5
#> + Fold07: mtry=9
#> - Fold07: mtry=9
#> + Fold08: mtry=2
#> - Fold08: mtry=2
#> + Fold08: mtry=5
#> - Fold08: mtry=5
#> + Fold08: mtry=9
#> - Fold08: mtry=9
#> + Fold09: mtry=2
#> - Fold09: mtry=2
#> + Fold09: mtry=5
#> - Fold09: mtry=5
#> + Fold09: mtry=9
#> - Fold09: mtry=9
#> + Fold10: mtry=2
#> - Fold10: mtry=2
#> + Fold10: mtry=5
#> - Fold10: mtry=5
#> + Fold10: mtry=9
#> - Fold10: mtry=9
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> + Fold01: mtry=2
#> - Fold01: mtry=2
#> + Fold01: mtry=5
#> - Fold01: mtry=5
#> + Fold01: mtry=9
#> - Fold01: mtry=9
#> + Fold02: mtry=2
#> - Fold02: mtry=2
#> + Fold02: mtry=5
#> - Fold02: mtry=5
#> + Fold02: mtry=9
#> - Fold02: mtry=9
#> + Fold03: mtry=2
#> - Fold03: mtry=2
#> + Fold03: mtry=5
#> - Fold03: mtry=5
#> + Fold03: mtry=9
#> - Fold03: mtry=9
#> + Fold04: mtry=2
#> - Fold04: mtry=2
#> + Fold04: mtry=5
#> - Fold04: mtry=5
#> + Fold04: mtry=9
#> - Fold04: mtry=9
#> + Fold05: mtry=2
#> - Fold05: mtry=2
#> + Fold05: mtry=5
#> - Fold05: mtry=5
#> + Fold05: mtry=9
#> - Fold05: mtry=9
#> + Fold06: mtry=2
#> - Fold06: mtry=2
#> + Fold06: mtry=5
#> - Fold06: mtry=5
#> + Fold06: mtry=9
#> - Fold06: mtry=9
#> + Fold07: mtry=2
#> - Fold07: mtry=2
#> + Fold07: mtry=5
#> - Fold07: mtry=5
#> + Fold07: mtry=9
#> - Fold07: mtry=9
#> + Fold08: mtry=2
#> - Fold08: mtry=2
#> + Fold08: mtry=5
#> - Fold08: mtry=5
#> + Fold08: mtry=9
#> - Fold08: mtry=9
#> + Fold09: mtry=2
#> - Fold09: mtry=2
#> + Fold09: mtry=5
#> - Fold09: mtry=5
#> + Fold09: mtry=9
#> - Fold09: mtry=9
#> + Fold10: mtry=2
#> - Fold10: mtry=2
#> + Fold10: mtry=5
#> - Fold10: mtry=5
#> + Fold10: mtry=9
#> - Fold10: mtry=9
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> + Fold01: mtry=2
#> - Fold01: mtry=2
#> + Fold01: mtry=5
#> - Fold01: mtry=5
#> + Fold01: mtry=9
#> - Fold01: mtry=9
#> + Fold02: mtry=2
#> - Fold02: mtry=2
#> + Fold02: mtry=5
#> - Fold02: mtry=5
#> + Fold02: mtry=9
#> - Fold02: mtry=9
#> + Fold03: mtry=2
#> - Fold03: mtry=2
#> + Fold03: mtry=5
#> - Fold03: mtry=5
#> + Fold03: mtry=9
#> - Fold03: mtry=9
#> + Fold04: mtry=2
#> - Fold04: mtry=2
#> + Fold04: mtry=5
#> - Fold04: mtry=5
#> + Fold04: mtry=9
#> - Fold04: mtry=9
#> + Fold05: mtry=2
#> - Fold05: mtry=2
#> + Fold05: mtry=5
#> - Fold05: mtry=5
#> + Fold05: mtry=9
#> - Fold05: mtry=9
#> + Fold06: mtry=2
#> - Fold06: mtry=2
#> + Fold06: mtry=5
#> - Fold06: mtry=5
#> + Fold06: mtry=9
#> - Fold06: mtry=9
#> + Fold07: mtry=2
#> - Fold07: mtry=2
#> + Fold07: mtry=5
#> - Fold07: mtry=5
#> + Fold07: mtry=9
#> - Fold07: mtry=9
#> + Fold08: mtry=2
#> Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
#> 10, : These variables have zero variances: region
#> - Fold08: mtry=2
#> + Fold08: mtry=5
#> Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
#> 10, : These variables have zero variances: region
#> - Fold08: mtry=5
#> + Fold08: mtry=9
#> Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
#> 10, : These variables have zero variances: region
#> - Fold08: mtry=9
#> + Fold09: mtry=2
#> - Fold09: mtry=2
#> + Fold09: mtry=5
#> - Fold09: mtry=5
#> + Fold09: mtry=9
#> - Fold09: mtry=9
#> + Fold10: mtry=2
#> - Fold10: mtry=2
#> + Fold10: mtry=5
#> - Fold10: mtry=5
#> + Fold10: mtry=9
#> - Fold10: mtry=9
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> No observation for response level(s): Informal
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> The following classes were not found in 'response': Informal.
#> $EstMdl
#> $EstMdl$`BchMk+networth`
#> Random Forest
#>
#> 401 samples
#> 9 predictor
#> 4 classes: 'No.Loan', 'Formal', 'Informal', 'L.Both'
#>
#> Pre-processing: centered (9), scaled (9)
#> Resampling: Cross-Validated (10 fold)
#> Summary of sample sizes: 361, 361, 361, 361, 360, 360, ...
#> Resampling results across tuning parameters:
#>
#> mtry Accuracy Kappa
#> 2 0.7582067 0.2929325
#> 5 0.7459506 0.2860607
#> 9 0.7407614 0.2848081
#>
#> Accuracy was used to select the optimal model using the largest value.
#> The final value used for the model was mtry = 2.
#>
#> $EstMdl$`D.0+networth`
#> Random Forest
#>
#> 299 samples
#> 9 predictor
#> 4 classes: 'No.Loan', 'Formal', 'Informal', 'L.Both'
#>
#> Pre-processing: centered (9), scaled (9)
#> Resampling: Cross-Validated (10 fold)
#> Summary of sample sizes: 270, 270, 268, 270, 269, 268, ...
#> Resampling results across tuning parameters:
#>
#> mtry Accuracy Kappa
#> 2 0.7554690 0.2311456
#> 5 0.7262291 0.2180392
#> 9 0.7198925 0.2112093
#>
#> Accuracy was used to select the optimal model using the largest value.
#> The final value used for the model was mtry = 2.
#>
#> $EstMdl$`D.1+networth`
#> Random Forest
#>
#> 103 samples
#> 9 predictor
#> 4 classes: 'No.Loan', 'Formal', 'Informal', 'L.Both'
#>
#> Pre-processing: centered (9), scaled (9)
#> Resampling: Cross-Validated (10 fold)
#> Summary of sample sizes: 93, 92, 93, 93, 93, 92, ...
#> Resampling results across tuning parameters:
#>
#> mtry Accuracy Kappa
#> 2 0.8263636 0.5809524
#> 5 0.7763636 0.4474490
#> 9 0.7472727 0.3894841
#>
#> Accuracy was used to select the optimal model using the largest value.
#> The final value used for the model was mtry = 2.
Estimate_Models gives the results based on the splits of the exog
. Combined_Performance prints out the total performance of these splits.
#> + Fold01: mtry=2
#> - Fold01: mtry=2
#> + Fold01: mtry=5
#> - Fold01: mtry=5
#> + Fold01: mtry=9
#> - Fold01: mtry=9
#> + Fold02: mtry=2
#> - Fold02: mtry=2
#> + Fold02: mtry=5
#> - Fold02: mtry=5
#> + Fold02: mtry=9
#> - Fold02: mtry=9
#> + Fold03: mtry=2
#> - Fold03: mtry=2
#> + Fold03: mtry=5
#> - Fold03: mtry=5
#> + Fold03: mtry=9
#> - Fold03: mtry=9
#> + Fold04: mtry=2
#> - Fold04: mtry=2
#> + Fold04: mtry=5
#> - Fold04: mtry=5
#> + Fold04: mtry=9
#> - Fold04: mtry=9
#> + Fold05: mtry=2
#> - Fold05: mtry=2
#> + Fold05: mtry=5
#> - Fold05: mtry=5
#> + Fold05: mtry=9
#> - Fold05: mtry=9
#> + Fold06: mtry=2
#> - Fold06: mtry=2
#> + Fold06: mtry=5
#> - Fold06: mtry=5
#> + Fold06: mtry=9
#> - Fold06: mtry=9
#> + Fold07: mtry=2
#> - Fold07: mtry=2
#> + Fold07: mtry=5
#> - Fold07: mtry=5
#> + Fold07: mtry=9
#> - Fold07: mtry=9
#> + Fold08: mtry=2
#> - Fold08: mtry=2
#> + Fold08: mtry=5
#> - Fold08: mtry=5
#> + Fold08: mtry=9
#> - Fold08: mtry=9
#> + Fold09: mtry=2
#> - Fold09: mtry=2
#> + Fold09: mtry=5
#> - Fold09: mtry=5
#> + Fold09: mtry=9
#> - Fold09: mtry=9
#> + Fold10: mtry=2
#> - Fold10: mtry=2
#> + Fold10: mtry=5
#> - Fold10: mtry=5
#> + Fold10: mtry=9
#> - Fold10: mtry=9
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> No observation for response level(s): Informal
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> The following classes were not found in 'response': Informal.
#> + Fold01: mtry=2
#> - Fold01: mtry=2
#> + Fold01: mtry=5
#> - Fold01: mtry=5
#> + Fold01: mtry=9
#> - Fold01: mtry=9
#> + Fold02: mtry=2
#> - Fold02: mtry=2
#> + Fold02: mtry=5
#> - Fold02: mtry=5
#> + Fold02: mtry=9
#> - Fold02: mtry=9
#> + Fold03: mtry=2
#> - Fold03: mtry=2
#> + Fold03: mtry=5
#> - Fold03: mtry=5
#> + Fold03: mtry=9
#> - Fold03: mtry=9
#> + Fold04: mtry=2
#> - Fold04: mtry=2
#> + Fold04: mtry=5
#> - Fold04: mtry=5
#> + Fold04: mtry=9
#> - Fold04: mtry=9
#> + Fold05: mtry=2
#> - Fold05: mtry=2
#> + Fold05: mtry=5
#> - Fold05: mtry=5
#> + Fold05: mtry=9
#> - Fold05: mtry=9
#> + Fold06: mtry=2
#> - Fold06: mtry=2
#> + Fold06: mtry=5
#> - Fold06: mtry=5
#> + Fold06: mtry=9
#> - Fold06: mtry=9
#> + Fold07: mtry=2
#> - Fold07: mtry=2
#> + Fold07: mtry=5
#> - Fold07: mtry=5
#> + Fold07: mtry=9
#> - Fold07: mtry=9
#> + Fold08: mtry=2
#> - Fold08: mtry=2
#> + Fold08: mtry=5
#> - Fold08: mtry=5
#> + Fold08: mtry=9
#> - Fold08: mtry=9
#> + Fold09: mtry=2
#> - Fold09: mtry=2
#> + Fold09: mtry=5
#> - Fold09: mtry=5
#> + Fold09: mtry=9
#> - Fold09: mtry=9
#> + Fold10: mtry=2
#> - Fold10: mtry=2
#> + Fold10: mtry=5
#> - Fold10: mtry=5
#> + Fold10: mtry=9
#> - Fold10: mtry=9
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> Multi-class area under the curve: 0.7317