The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The package includes several weighting schemes which can be parameterized, as well as custom configuration options. Furthermore, users can decide whether they wish to positively or negatively affect the accuracy score as a result of applying weights to the confusion matrix. “wconf” integrates well with the “caret” package, but it can also work standalone when provided data in matrix form.
Applying a weighting scheme to the confusion matrix can be useful in applications such as performance evaluation, where characteristics such as “underperforming”, “acceptable”, “overperforming” and “worker of the year” may represent gradations that are far apart and unevenly spaced. Similarly, where the objective is to classify geographic regions and proximity of the prediction to the actual region constitutes an advantage in terms of the model’s performance, applying a weighting scheme facilitates the model selection process.
Functions are included to calculate accuracy metrics for imbalanced data. Specifically, the package allows users to compute the Starovoitov-Golub sine-accuracy function, as well as the balanced accuracy function and the standard accuracy indicator.
wconf consists of the following functions:
This function allows users to choose from different weighting schemes and experiment with parametrizations and custom configurations.
weightmatrix(n, weight.type, weight.penalty, standard.deviation, geometric.multiplier, interval.high, interval.low, custom.weights, plot.weights)
n – the number of classes contained in the confusion matrix.
weight.type – the weighting schema to be used. Can be one of: “arithmetic” - a decreasing arithmetic progression weighting scheme, “geometric” - a decreasing geometric progression weighting scheme, “normal” - weights drawn from the right tail of a normal distribution, “interval” - weights contained on a user-defined interval, “custom” - custom weight vector defined by the user.
weight.penalty – determines whether the weights associated with non-diagonal elements generated by the “normal”, “arithmetic” and “geometric” weight types are positive or negative values. By default, the value is set to FALSE, which means that generated weights will be positive values.
standard.deviation – standard deviation of the normal distribution, if the normal distribution weighting schema is used.
geometric.multiplier – the multiplier used to construct the geometric progression series, if the geometric progression weighting scheme is used.
interval.high – the upper bound of the weight interval, if the interval weighting scheme is used.
interval.low – the lower bound of the weight interval, if the interval weighting scheme is used.
custom.weights – the vector of custom weights to be applied, is the custom weighting scheme was selected. The vector should be equal to “n”, but can be larger, with excess values being ignored.
plot.weights – optional setting to enable plotting of weight vector, corresponding to the first column of the weight matrix
This function calculates the weighted confusion matrix by multiplying, element-by-element, a weight matrix with a supplied confusion matrix object.
wconfusionmatrix(m, weight.type, weight.penalty, standard.deviation, geometric.multiplier, interval.high, interval.low, custom.weights, print.weighted.accuracy)
m – the caret confusion matrix object or simple matrix.
weight.type – the weighting schema to be used. Can be one of: “arithmetic” - a decreasing arithmetic progression weighting scheme, “geometric” - a decreasing geometric progression weighting scheme, “normal” - weights drawn from the right tail of a normal distribution, “interval” - weights contained on a user-defined interval, “custom” - custom weight vector defined by the user.
weight.penalty – determines whether the weights associated with non-diagonal elements generated by the “normal”, “arithmetic” and “geometric” weight types are positive or negative values. By default, the value is set to FALSE, which means that generated weights will be positive values.
standard.deviation – standard deviation of the normal distribution, if the normal distribution weighting schema is used.
geometric.multiplier – the multiplier used to construct the geometric progression series, if the geometric progression weighting scheme is used.
interval.high – the upper bound of the weight interval, if the interval weighting scheme is used.
interval.low – the lower bound of the weight interval, if the interval weighting scheme is used.
custom.weights – the vector of custom weights to be applied, is the custom weighting scheme was selected. The vector should be equal to “n”, but can be larger, with excess values being ignored.
print.weighted.accuracy – optional setting to print the weighted accuracy metric, which represents the sum of all weighted confusion matrix cells divided by the total number of observations.
This function calculates the redistributed confusion matrix by reallocating observations classified in the vicinity of the true category to the confusion matrix diagonal, according to a user-specified weighting scheme which determines the proportion of observations to reassign.
rconfusionmatrix(m, custom.weights, print.weighted.accuracy)
m – the caret confusion matrix object or simple matrix.
custom.weights – the vector of custom weights to be applied. The vector should be equal to “n”, but can be larger, with the first value and all excess values being ignored.
print.weighted.accuracy – optional setting to print the standard redistributed accuracy metric, which represents the sum of all observations on the diagonal divided by the total number of observations.
This function calculates classification accuracy scores using the sine-based formulas proposed by Starovoitov and Golub (2020). The advantage of the new method consists in producing improved results when compared with the standard balanced accuracy function, by taking into account the class distribution of errors. This feature renders the method useful when confronted with imbalanced data.
balancedaccuracy(m, print.scores)
The function takes as input:
m - the caret confusion matrix object or simple matrix.
print.scores - used to display the accuracy scores when set to TRUE.
For custom specifications, since the interval of variation of the weights is not bound to any given interval, depending on the user configuration, it is possible to obtain negative accuracy scores.
You can download wconf directly from Github. To do so, you need to have the devtools package installed and loaded. Once you are in R, run the following commands:
install.packages(“devtools”)
library(“devtools”)
install_github(“alexandrumonahov/wconf”)
You may face downloading errors from Github if you are behind a firewall or there are https download restrictions. To avoid this, you can try running the following commands:
options(download.file.method = “libcurl”)
options(download.file.method = “wininet”)
Once the package is installed, you can run it using the: library(wconf) command.
Alexandru Monahov, 2024
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.