The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
In this vignette a worked example on creating Precision-Recall and
Receiver Operator Characteristics curves are provided. Throughout this
vignette the Banknote
Authentication-dataset is used. The banknote
-dataset is
a part of {SLmetrics} which is list
of features and
targets, and can be called as follows:
The banknote
dataset classification tasks achieves
between 95% and 99% accuracy and therefore makes a bad case for
demonstrating Precision-Recall and Receiver Operator Characteristics
curves. To alleviate this, random noise will be injected to the original
dataset as follows:
# 1) set seed
set.seed(1903)
# 2) extract indices
# for shuffling
noise <- sample(
x = 1:nrow(banknote$features),
size = nrow(banknote$features) * 0.50
)
# 3) reshuffle
# features and target
noise <- cbind(
banknote$features[sample(noise),],
target = banknote$target[sample(noise)]
)
The data.frame
is constructed as follows:
# 1) convert to data.frame
# and head
head(
banknote <- cbind(
banknote$features,
target = banknote$target
)
)
#> variance skewness curtosis entropy target
#> 1 3.62160 8.6661 -2.8073 -0.44699 authentic
#> 2 4.54590 8.1674 -2.4586 -1.46210 authentic
#> 3 3.86600 -2.6383 1.9242 0.10645 authentic
#> 4 3.45660 9.5228 -4.0112 -3.59440 authentic
#> 5 0.32924 -4.4552 4.5718 -0.98880 authentic
#> 6 4.36840 9.6718 -3.9606 -3.16250 authentic
# 2) introduce random
# noise to the data
# NOTE: wrapped in `try()` in case
# noise is removed
try(
expr = {
banknote <- rbind(
banknote,
noise
)
},
silent = TRUE
)
# 3) convert target to binary
# value
banknote$target <- as.numeric(
banknote$target == "inauthentic"
)
To predict whether the banknotes are authentic or inauthentic a logistic regression will be trained on a training sample, and evaluated on a the test sample.
To train and test test the model a training/test split with 80% and 20%.
To evaluate the performance we will extract the response probabilities
# 1) extract class
# probabilites
class_probabilities <- predict(
object = model,
newdata = subset(test, select = -target),
type = "response"
)
# 2) calculate class
class_probabilities <- as.matrix(
cbind(
class_probabilities,
1 - class_probabilities
)
)
# 1) create actual
# value
actual <- factor(
x = test$target,
levels = c(1, 0),
labels = c("inauthentic", "authentic")
)
# 1) construct precision-recall
# object
print(
precision_recall <- prROC(
actual = actual,
response = class_probabilities
)
)
#> threshold level label recall precision
#> 1 Inf 1 inauthentic 0.00000 1.000
#> 2 0.919 1 inauthentic 0.00535 1.000
#> 3 0.917 1 inauthentic 0.01070 1.000
#> 4 0.909 1 inauthentic 0.01604 1.000
#> 5 0.906 1 inauthentic 0.02139 1.000
#> 6 0.903 1 inauthentic 0.02674 1.000
#> 7 0.901 1 inauthentic 0.03209 1.000
#> 8 0.898 1 inauthentic 0.03209 0.857
#> 9 0.898 1 inauthentic 0.03743 0.875
#> 10 0.895 1 inauthentic 0.04278 0.889
#> [ reached 'max' / getOption("max.print") -- omitted 816 rows ]
The Precision-Recall object can be visualized by using
plot()
# 1) construct Receiver Operator Characteristics
# object
print(
receiver_operator_characteristics <- ROC(
actual = actual,
response = class_probabilities
)
)
#> threshold level label tpr fpr
#> 1 Inf 1 inauthentic 0.00000 0.00000
#> 2 0.919 1 inauthentic 0.00535 0.00000
#> 3 0.917 1 inauthentic 0.01070 0.00000
#> 4 0.909 1 inauthentic 0.01604 0.00000
#> 5 0.906 1 inauthentic 0.02139 0.00000
#> 6 0.903 1 inauthentic 0.02674 0.00000
#> 7 0.901 1 inauthentic 0.03209 0.00000
#> 8 0.898 1 inauthentic 0.03209 0.00444
#> 9 0.898 1 inauthentic 0.03743 0.00444
#> 10 0.895 1 inauthentic 0.04278 0.00444
#> [ reached 'max' / getOption("max.print") -- omitted 816 rows ]
The Receiver Operator Characteristics object can be visualized by
using plot()
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.