imputeTestbench : Test bench for Missing Data Imputing Models/Methods Comparison

Neeraj Bokde (http://www.neerajbokde.com/cran/imputetestbench)

2016-07-28

This Document is to introduce the R package ‘imputeTestbench’. It is an testing workbench for comparison of missing data imptation models/methods. It compares imputing methods with reference to RMSE, MAE or MAPE parameters. It allows to add new proposed methods to test bench and to compare with other methods. The function append_method() allows to add multiple numbers of methods to the existing methods available in test bench.

Following example describs the working of this package:

Consider a sample data datax as follows:

datax <- c(1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5)

Import library for Package imputeTestbench as follows:

library(imputeTestbench)

The function impute_errors() is used to compare imputing methods with reference to RMSE, MAE or MAPE parameters. Syntax of `impute_errors()’ as shown below:

impute_errors(dataIn, missPercentFrom, missPercentTo, interval, repetition, errorParameter, MethodPath, MethodName)

where,

At simplest form, function impute_errors() can we used as:

q <- impute_errors(datax)
q
## $Parameter
## [1] "RMSE Plot"
## 
## $Missing_Percent
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
## 
## $Historic_Mean
## [1] 0.4789879 0.6250889 0.8108440 0.9018024 0.9856108 1.1087825 1.1952286
## [8] 1.2724180
## 
## $Interpolation
## [1] 0.6220167 0.7748639 0.8716673 1.3633658 1.2714936 1.3627703 1.2976507
## [8] 1.8725297
# By default, the bar plot is used to show the comparison
plot_errors(q)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]  1.5  4.5  7.5 10.5 13.5 16.5 19.5 22.5
## [2,]  2.5  5.5  8.5 11.5 14.5 17.5 20.5 23.5
# Also, User can plot the comparison with line plot as:
plot_errors(dataIn = q, plotType = 2)

By default, this function compares two basic imputation methods, i.e. Historical means and Interpolation methods. The plot_errors() function is used to plot the comparison plots between different methods. This test bench allows to add one more imputing method to compare with already existing methods. The only care is to be takes as, the new imputing method is to be designed in function format such that it could return imputed data as output. Suppose, following function is the desired method to add in test bench.

===============================

inter <- function(outs)

{

library(imputeTS)

outs <- ts(outs)

d <- na.random(outs)

return(d)

}

===============================

Save this function in new R script file and save it and note its Source location similar to "source('~/imputeTestbench/R/inter.R')" and use ’impute_errors()` function as:

#aa <- append_method(existing_method = q,dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/inter.R')", MethodName = "Random")

#aa
#plot_errors(aa)

This above code is written in commented format, since this function is dependent on other function and its location, which is not included in this package.

If user wishes to add more than one imputation methods to test bench, the function append_method() is used as:

#bb <- append_method(existing_method = aa, dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/PSFimpute.R')", MethodName = "PSFimpute")

#bb
#plot_errors(bb)

where

Similarly, user can remove an imputation method from test bench with following function

#cc <- remove_method(existing_method = bb, method_number = 1)
#cc
#plot_errors(cc)

To introduce missing patches as desired locations, random parameter is used. When random = 1, package itself inroduce missing values at completely random places, whereas when random = 0, it allows user to introduce missing patches as desired locations as shown in following code.

dd <- impute_errors(random = 0, startPoint = c(10, 20, 30), patchLength = c(3, 4, 5))
dd
## $Parameter
## [1] "RMSE Plot"
## 
## $Missing_Percent
## [1] 0.12
## 
## $Historic_Mean
## [1] 0.5746791
## 
## $Interpolation
## [1] 0.7843964