Exporting objects and functions from the workspace

Phil Chalmers

2017-10-29

Including fixed objects

R generally has a recursive strategy when attempting to find objects within functions. If an object can’t be found, R will look just outside the function to see if the object can be located there, and if not, look one level higher….and so on, until it searches for the object in the user workspace. This is a strange feature to most programmers who come from other languages, and when writing simulations may cause some initially unwanted issues. This tutorial demonstrates how to make sure all required user-defined objects are visible to SimDesign.

Scoping

To demonstrate the issue, let’s define two objects and a function which uses these objects.

obj1 <- 10
obj2 <- 20

When evaluated, these objects are visible to the user, and can be seen by typing in the R console by typing ls(). Functions which do not define objects with the same name will also be able to locate these values.

myfun <- function(x) obj1 + obj2
myfun(1)
## [1] 30

This behavior is indeed a bit strange, but it’s one of R’s quirks. Unfortunately, when running code in parallel across different cores these objects will not be visible, and therefore must be exported using other methods (e.g., in the parallel package this is done with clusterExport()).

library(parallel)
cl <- makeCluster(2)
res <- try(parSapply(cl=cl, 1:4, myfun))
res
## Error in checkForRemoteErrors(val) : 
##   2 nodes produced errors; first error: object 'obj1' not found

Exporting the objects to the cluster fixes the issue.

clusterExport(cl=cl, c('obj1', 'obj2'))
parSapply(cl=cl, 1:4, myfun)
## [1] 30 30 30 30

The same reasoning above applies to functions defined in the R workspace as well, including functions defined within external R packages. Hence, in order to use functions from other packages they must either be explicitly loaded with require() or library() within the distributed code, or referenced via their Namespace with the :: operator (e.g., mvtnorm::rmvtnorm()).

Exporting objects example

In order to make objects safely visible in SimDesign the strategy is very simple: wrap all desired objects into a named list, and pass this to the fixed_objects argument. From here, elements can be indexed using the $ operator or with() function, or whatever other method may be convenient. However, this is only required for defined objects not functionsSimDesign automatically makes user-defined functions available across all nodes.

Note: An alternative approach is simply to define/source the objects within the respective SimDesign functions, that way they will clearly be visible at runtime. The following approach is really only useful when the defined objects contain a large amount of code.

library(SimDesign)
#SimFunctions(comments = FALSE)

### Define design conditions and number of replications
Design <- expand.grid(N = c(10, 20, 30))
replications <- 1000

# define custom functions and objects (or use source() to read these in from an external file)
SD <- 2
my_gen_fun <- function(n, sd) rnorm(n, sd = sd)
my_analyse_fun <- function(x) c(p = t.test(x)$p.value)
fixed_objects <- list(SD=SD)

#---------------------------------------------------------------------------

Generate <- function(condition, fixed_objects = NULL) {
    Attach(condition) # make condition names available (e.g., N)
    
    # further, can use with() to use 'SD' directly instead of 'fixed_objects$SD'
    ret <- with(fixed_objects, my_gen_fun(N, sd=SD))
    ret
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    ret <- my_analyse_fun(dat)
    ret
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    ret <- EDR(results, alpha = .05)
    ret
}

#---------------------------------------------------------------------------

### Run the simulation
results <- runSimulation(Design, replications, verbose=FALSE, fixed_objects=fixed_objects,
                         generate=Generate, analyse=Analyse, summarise=Summarise, edit='none')
results
##    N     p REPLICATIONS SIM_TIME                COMPLETED
## 1 10 0.041         1000    0.24s Sun Oct 29 19:05:23 2017
## 2 20 0.052         1000    0.24s Sun Oct 29 19:05:23 2017
## 3 30 0.061         1000    0.24s Sun Oct 29 19:05:24 2017

By placing objects in a list and passing this to fixed_objects, the objects are safely made available to all relevant functions. Furthermore, running this code in parallel will also be valid as a consequence (see below). Remember, this is only required for R objects, NOT user-defined functions!

results <- runSimulation(Design, replications, verbose=FALSE, fixed_objects=fixed_objects,
                         generate=Generate, analyse=Analyse, summarise=Summarise, edit='none',
                         parallel = TRUE)