Pomodoro_Vignette

Seyma Kalay

2022-01-06

library(pomodoro)

This package was set for the Credit Access studies. But it can be used for the binary and multiple factor variables. First thing let’s see the str of the sample_data with str(sample_data). Since the dataset is huge, let’s take the first 500 rows and set the study on it.

The following example run the multinominal logistic model in yvar. The function simplifies the 80/20 train test set using 10cv after scaled and center it.

#> Loading required package: ggplot2
#> Loading required package: lattice
#> + Fold01: mtry= 2 
#> - Fold01: mtry= 2 
#> + Fold01: mtry= 6 
#> - Fold01: mtry= 6 
#> + Fold01: mtry=10 
#> - Fold01: mtry=10 
#> + Fold02: mtry= 2 
#> - Fold02: mtry= 2 
#> + Fold02: mtry= 6 
#> - Fold02: mtry= 6 
#> + Fold02: mtry=10 
#> - Fold02: mtry=10 
#> + Fold03: mtry= 2 
#> - Fold03: mtry= 2 
#> + Fold03: mtry= 6 
#> - Fold03: mtry= 6 
#> + Fold03: mtry=10 
#> - Fold03: mtry=10 
#> + Fold04: mtry= 2 
#> - Fold04: mtry= 2 
#> + Fold04: mtry= 6 
#> - Fold04: mtry= 6 
#> + Fold04: mtry=10 
#> - Fold04: mtry=10 
#> + Fold05: mtry= 2 
#> - Fold05: mtry= 2 
#> + Fold05: mtry= 6 
#> - Fold05: mtry= 6 
#> + Fold05: mtry=10 
#> - Fold05: mtry=10 
#> + Fold06: mtry= 2 
#> - Fold06: mtry= 2 
#> + Fold06: mtry= 6 
#> - Fold06: mtry= 6 
#> + Fold06: mtry=10 
#> - Fold06: mtry=10 
#> + Fold07: mtry= 2 
#> - Fold07: mtry= 2 
#> + Fold07: mtry= 6 
#> - Fold07: mtry= 6 
#> + Fold07: mtry=10 
#> - Fold07: mtry=10 
#> + Fold08: mtry= 2 
#> - Fold08: mtry= 2 
#> + Fold08: mtry= 6 
#> - Fold08: mtry= 6 
#> + Fold08: mtry=10 
#> - Fold08: mtry=10 
#> + Fold09: mtry= 2 
#> - Fold09: mtry= 2 
#> + Fold09: mtry= 6 
#> - Fold09: mtry= 6 
#> + Fold09: mtry=10 
#> - Fold09: mtry=10 
#> + Fold10: mtry= 2 
#> - Fold10: mtry= 2 
#> + Fold10: mtry= 6 
#> - Fold10: mtry= 6 
#> + Fold10: mtry=10 
#> - Fold10: mtry=10 
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> 
#> Call:
#>  randomForest(formula = Data.sub.train[, yvar] ~ ., data = Data.sub.train) 
#>                Type of random forest: classification
#>                      Number of trees: 500
#> No. of variables tried at each split: 2
#> 
#>         OOB estimate of  error rate: 23.44%
#> Confusion matrix:
#>          No.Loan Formal Informal L.Both class.error
#> No.Loan      282      7        0      0  0.02422145
#> Formal        31     25        0      0  0.55357143
#> Informal      38      0        0      0  1.00000000
#> L.Both        14      4        0      0  1.00000000
#> Multi-class area under the curve: 0.7583

Estimate_Models

Estimate_Models function considers exog and xadd variables and set multiple models based on the selected exog and xadd. On the one hand exog is subtract the selected vector from the dataset and run the model for all the dataset and for the splits of the exog. On the other hand xadd add the selected vectors and run the model. Where the dnames are the unique values in exog this is to save the model estimates by their name.

#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> + Fold01: mtry=2 
#> - Fold01: mtry=2 
#> + Fold01: mtry=5 
#> - Fold01: mtry=5 
#> + Fold01: mtry=9 
#> - Fold01: mtry=9 
#> + Fold02: mtry=2 
#> - Fold02: mtry=2 
#> + Fold02: mtry=5 
#> - Fold02: mtry=5 
#> + Fold02: mtry=9 
#> - Fold02: mtry=9 
#> + Fold03: mtry=2 
#> - Fold03: mtry=2 
#> + Fold03: mtry=5 
#> - Fold03: mtry=5 
#> + Fold03: mtry=9 
#> - Fold03: mtry=9 
#> + Fold04: mtry=2 
#> - Fold04: mtry=2 
#> + Fold04: mtry=5 
#> - Fold04: mtry=5 
#> + Fold04: mtry=9 
#> - Fold04: mtry=9 
#> + Fold05: mtry=2 
#> - Fold05: mtry=2 
#> + Fold05: mtry=5 
#> - Fold05: mtry=5 
#> + Fold05: mtry=9 
#> - Fold05: mtry=9 
#> + Fold06: mtry=2 
#> - Fold06: mtry=2 
#> + Fold06: mtry=5 
#> - Fold06: mtry=5 
#> + Fold06: mtry=9 
#> - Fold06: mtry=9 
#> + Fold07: mtry=2 
#> - Fold07: mtry=2 
#> + Fold07: mtry=5 
#> - Fold07: mtry=5 
#> + Fold07: mtry=9 
#> - Fold07: mtry=9 
#> + Fold08: mtry=2 
#> - Fold08: mtry=2 
#> + Fold08: mtry=5 
#> - Fold08: mtry=5 
#> + Fold08: mtry=9 
#> - Fold08: mtry=9 
#> + Fold09: mtry=2 
#> - Fold09: mtry=2 
#> + Fold09: mtry=5 
#> - Fold09: mtry=5 
#> + Fold09: mtry=9 
#> - Fold09: mtry=9 
#> + Fold10: mtry=2 
#> - Fold10: mtry=2 
#> + Fold10: mtry=5 
#> - Fold10: mtry=5 
#> + Fold10: mtry=9 
#> - Fold10: mtry=9 
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> + Fold01: mtry=2 
#> - Fold01: mtry=2 
#> + Fold01: mtry=5 
#> - Fold01: mtry=5 
#> + Fold01: mtry=9 
#> - Fold01: mtry=9 
#> + Fold02: mtry=2 
#> - Fold02: mtry=2 
#> + Fold02: mtry=5 
#> - Fold02: mtry=5 
#> + Fold02: mtry=9 
#> - Fold02: mtry=9 
#> + Fold03: mtry=2 
#> - Fold03: mtry=2 
#> + Fold03: mtry=5 
#> - Fold03: mtry=5 
#> + Fold03: mtry=9 
#> - Fold03: mtry=9 
#> + Fold04: mtry=2 
#> - Fold04: mtry=2 
#> + Fold04: mtry=5 
#> - Fold04: mtry=5 
#> + Fold04: mtry=9 
#> - Fold04: mtry=9 
#> + Fold05: mtry=2 
#> - Fold05: mtry=2 
#> + Fold05: mtry=5 
#> - Fold05: mtry=5 
#> + Fold05: mtry=9 
#> - Fold05: mtry=9 
#> + Fold06: mtry=2 
#> - Fold06: mtry=2 
#> + Fold06: mtry=5 
#> - Fold06: mtry=5 
#> + Fold06: mtry=9 
#> - Fold06: mtry=9 
#> + Fold07: mtry=2 
#> - Fold07: mtry=2 
#> + Fold07: mtry=5 
#> - Fold07: mtry=5 
#> + Fold07: mtry=9 
#> - Fold07: mtry=9 
#> + Fold08: mtry=2 
#> - Fold08: mtry=2 
#> + Fold08: mtry=5 
#> - Fold08: mtry=5 
#> + Fold08: mtry=9 
#> - Fold08: mtry=9 
#> + Fold09: mtry=2 
#> - Fold09: mtry=2 
#> + Fold09: mtry=5 
#> - Fold09: mtry=5 
#> + Fold09: mtry=9 
#> - Fold09: mtry=9 
#> + Fold10: mtry=2 
#> - Fold10: mtry=2 
#> + Fold10: mtry=5 
#> - Fold10: mtry=5 
#> + Fold10: mtry=9 
#> - Fold10: mtry=9 
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> + Fold01: mtry=2 
#> - Fold01: mtry=2 
#> + Fold01: mtry=5 
#> - Fold01: mtry=5 
#> + Fold01: mtry=9 
#> - Fold01: mtry=9 
#> + Fold02: mtry=2 
#> - Fold02: mtry=2 
#> + Fold02: mtry=5 
#> - Fold02: mtry=5 
#> + Fold02: mtry=9 
#> - Fold02: mtry=9 
#> + Fold03: mtry=2 
#> - Fold03: mtry=2 
#> + Fold03: mtry=5 
#> - Fold03: mtry=5 
#> + Fold03: mtry=9 
#> - Fold03: mtry=9 
#> + Fold04: mtry=2 
#> - Fold04: mtry=2 
#> + Fold04: mtry=5 
#> - Fold04: mtry=5 
#> + Fold04: mtry=9 
#> - Fold04: mtry=9 
#> + Fold05: mtry=2 
#> - Fold05: mtry=2 
#> + Fold05: mtry=5 
#> - Fold05: mtry=5 
#> + Fold05: mtry=9 
#> - Fold05: mtry=9 
#> + Fold06: mtry=2 
#> - Fold06: mtry=2 
#> + Fold06: mtry=5 
#> - Fold06: mtry=5 
#> + Fold06: mtry=9 
#> - Fold06: mtry=9 
#> + Fold07: mtry=2 
#> - Fold07: mtry=2 
#> + Fold07: mtry=5 
#> - Fold07: mtry=5 
#> + Fold07: mtry=9 
#> - Fold07: mtry=9 
#> + Fold08: mtry=2
#> Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
#> 10, : These variables have zero variances: region
#> - Fold08: mtry=2 
#> + Fold08: mtry=5
#> Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
#> 10, : These variables have zero variances: region
#> - Fold08: mtry=5 
#> + Fold08: mtry=9
#> Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
#> 10, : These variables have zero variances: region
#> - Fold08: mtry=9 
#> + Fold09: mtry=2 
#> - Fold09: mtry=2 
#> + Fold09: mtry=5 
#> - Fold09: mtry=5 
#> + Fold09: mtry=9 
#> - Fold09: mtry=9 
#> + Fold10: mtry=2 
#> - Fold10: mtry=2 
#> + Fold10: mtry=5 
#> - Fold10: mtry=5 
#> + Fold10: mtry=9 
#> - Fold10: mtry=9 
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> No observation for response level(s): Informal
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> The following classes were not found in 'response': Informal.
#> $EstMdl
#> $EstMdl$`BchMk+networth`
#> Random Forest 
#> 
#> 401 samples
#>   9 predictor
#>   4 classes: 'No.Loan', 'Formal', 'Informal', 'L.Both' 
#> 
#> Pre-processing: centered (9), scaled (9) 
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 361, 361, 361, 361, 360, 360, ... 
#> Resampling results across tuning parameters:
#> 
#>   mtry  Accuracy   Kappa    
#>   2     0.7582067  0.2929325
#>   5     0.7459506  0.2860607
#>   9     0.7407614  0.2848081
#> 
#> Accuracy was used to select the optimal model using the largest value.
#> The final value used for the model was mtry = 2.
#> 
#> $EstMdl$`D.0+networth`
#> Random Forest 
#> 
#> 299 samples
#>   9 predictor
#>   4 classes: 'No.Loan', 'Formal', 'Informal', 'L.Both' 
#> 
#> Pre-processing: centered (9), scaled (9) 
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 270, 270, 268, 270, 269, 268, ... 
#> Resampling results across tuning parameters:
#> 
#>   mtry  Accuracy   Kappa    
#>   2     0.7554690  0.2311456
#>   5     0.7262291  0.2180392
#>   9     0.7198925  0.2112093
#> 
#> Accuracy was used to select the optimal model using the largest value.
#> The final value used for the model was mtry = 2.
#> 
#> $EstMdl$`D.1+networth`
#> Random Forest 
#> 
#> 103 samples
#>   9 predictor
#>   4 classes: 'No.Loan', 'Formal', 'Informal', 'L.Both' 
#> 
#> Pre-processing: centered (9), scaled (9) 
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 93, 92, 93, 93, 93, 92, ... 
#> Resampling results across tuning parameters:
#> 
#>   mtry  Accuracy   Kappa    
#>   2     0.8263636  0.5809524
#>   5     0.7763636  0.4474490
#>   9     0.7472727  0.3894841
#> 
#> Accuracy was used to select the optimal model using the largest value.
#> The final value used for the model was mtry = 2.

Combined_Performance

Estimate_Models gives the results based on the splits of the exog. Combined_Performance prints out the total performance of these splits.

#> + Fold01: mtry=2 
#> - Fold01: mtry=2 
#> + Fold01: mtry=5 
#> - Fold01: mtry=5 
#> + Fold01: mtry=9 
#> - Fold01: mtry=9 
#> + Fold02: mtry=2 
#> - Fold02: mtry=2 
#> + Fold02: mtry=5 
#> - Fold02: mtry=5 
#> + Fold02: mtry=9 
#> - Fold02: mtry=9 
#> + Fold03: mtry=2 
#> - Fold03: mtry=2 
#> + Fold03: mtry=5 
#> - Fold03: mtry=5 
#> + Fold03: mtry=9 
#> - Fold03: mtry=9 
#> + Fold04: mtry=2 
#> - Fold04: mtry=2 
#> + Fold04: mtry=5 
#> - Fold04: mtry=5 
#> + Fold04: mtry=9 
#> - Fold04: mtry=9 
#> + Fold05: mtry=2 
#> - Fold05: mtry=2 
#> + Fold05: mtry=5 
#> - Fold05: mtry=5 
#> + Fold05: mtry=9 
#> - Fold05: mtry=9 
#> + Fold06: mtry=2 
#> - Fold06: mtry=2 
#> + Fold06: mtry=5 
#> - Fold06: mtry=5 
#> + Fold06: mtry=9 
#> - Fold06: mtry=9 
#> + Fold07: mtry=2 
#> - Fold07: mtry=2 
#> + Fold07: mtry=5 
#> - Fold07: mtry=5 
#> + Fold07: mtry=9 
#> - Fold07: mtry=9 
#> + Fold08: mtry=2 
#> - Fold08: mtry=2 
#> + Fold08: mtry=5 
#> - Fold08: mtry=5 
#> + Fold08: mtry=9 
#> - Fold08: mtry=9 
#> + Fold09: mtry=2 
#> - Fold09: mtry=2 
#> + Fold09: mtry=5 
#> - Fold09: mtry=5 
#> + Fold09: mtry=9 
#> - Fold09: mtry=9 
#> + Fold10: mtry=2 
#> - Fold10: mtry=2 
#> + Fold10: mtry=5 
#> - Fold10: mtry=5 
#> + Fold10: mtry=9 
#> - Fold10: mtry=9 
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> No observation for response level(s): Informal
#> Warning in multiclass.roc.multivariate(response, predictor, levels, percent, :
#> The following classes were not found in 'response': Informal.
#> + Fold01: mtry=2 
#> - Fold01: mtry=2 
#> + Fold01: mtry=5 
#> - Fold01: mtry=5 
#> + Fold01: mtry=9 
#> - Fold01: mtry=9 
#> + Fold02: mtry=2 
#> - Fold02: mtry=2 
#> + Fold02: mtry=5 
#> - Fold02: mtry=5 
#> + Fold02: mtry=9 
#> - Fold02: mtry=9 
#> + Fold03: mtry=2 
#> - Fold03: mtry=2 
#> + Fold03: mtry=5 
#> - Fold03: mtry=5 
#> + Fold03: mtry=9 
#> - Fold03: mtry=9 
#> + Fold04: mtry=2 
#> - Fold04: mtry=2 
#> + Fold04: mtry=5 
#> - Fold04: mtry=5 
#> + Fold04: mtry=9 
#> - Fold04: mtry=9 
#> + Fold05: mtry=2 
#> - Fold05: mtry=2 
#> + Fold05: mtry=5 
#> - Fold05: mtry=5 
#> + Fold05: mtry=9 
#> - Fold05: mtry=9 
#> + Fold06: mtry=2 
#> - Fold06: mtry=2 
#> + Fold06: mtry=5 
#> - Fold06: mtry=5 
#> + Fold06: mtry=9 
#> - Fold06: mtry=9 
#> + Fold07: mtry=2 
#> - Fold07: mtry=2 
#> + Fold07: mtry=5 
#> - Fold07: mtry=5 
#> + Fold07: mtry=9 
#> - Fold07: mtry=9 
#> + Fold08: mtry=2 
#> - Fold08: mtry=2 
#> + Fold08: mtry=5 
#> - Fold08: mtry=5 
#> + Fold08: mtry=9 
#> - Fold08: mtry=9 
#> + Fold09: mtry=2 
#> - Fold09: mtry=2 
#> + Fold09: mtry=5 
#> - Fold09: mtry=5 
#> + Fold09: mtry=9 
#> - Fold09: mtry=9 
#> + Fold10: mtry=2 
#> - Fold10: mtry=2 
#> + Fold10: mtry=5 
#> - Fold10: mtry=5 
#> + Fold10: mtry=9 
#> - Fold10: mtry=9 
#> Aggregating results
#> Selecting tuning parameters
#> Fitting mtry = 2 on full training set
#> Multi-class area under the curve: 0.7317