The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The ‘cholera’ package supports parallelization of certain functions
using the ‘parallel’ package, which is included in R’s base
distribution. On macOS and Unix, this is done using
parallel::mclapply(); on Windows, this is done using
parallel::parLapply(). For reasons discussed below, parallelization is
off by default. For functions that support it, you need to set
“multi.core = TRUE”; this will use all of your machine’s logical cores.
You can also pass the number of logical cores you want to use. To check
the number of available cores, use parallel::detectCores()
.
To avoid the performance penalties of paging to disk, you should having
adequate RAM. A conservative estimate is that each task can take up to
500 MB. So if you’re running on jobs on 8 cores, you’ll need at least
4GB of available RAM.
The reason that parallelization is off by default is that ‘parallel’ package’s documentation goes to great length to discourage the use of these functions interactively:
Note that although some precautions are taken in R.app on macOS, the developers of the ‘parallel’ package, which neighborhoodWalking() uses, strongly discourage against using parallelization within a GUI or embedded environment. That said, with more recent versions of ‘parallel’, I only rarely experience crashes. As an experiment, I’ve set “multi.core = TRUE”.
That said, with more recent versions of ‘parallel’, I have not experienced crashes either in the R application or in RStudio.
The timings below (in seconds) were done on a 2.3 GHz Intel Core i7 using the ‘microbenchmark’ package with R version 3.6.1 on macOS 10.14.6. This includes timings for parallel:parLapply(), which is the function used to support parallelization on Windows.
neighborhoodWalking() | 1 logical core | 8 logical cores |
---|---|---|
plot.walking() | 4.5 | 3.8 |
plot.walking(case.set = “expected”, type = “road”) | 26 | 10 |
plot.walking(case.set = “expected”, type = “area.points”) | 26 | 11 |
plot.walking(case.set = “expected”, type = “area.polygons”) | 52 | 19 |
neighborhoodEuclidean() | 1 logical core | 8 logical cores |
---|---|---|
plot.euclidean() | 3.6 | 1.3 |
plot.euclidean(case.set = “expected”, type = “road”) | 109 | 28 |
plot.euclidean(case.set = “expected”, type = “area.points”) | 109 | 28 |
plot.euclidean(case.set = “expected”, type = “area.polygons”) | 126 | 46 |
function | 1 logical core | 8 logical cores |
---|---|---|
nearestPump() | 2.4 | 1.8 |
nearestPump(metric = “euclidean”) | 3.1 | 1.0 |
nearestPump(case.set = “expected”) | 348 | 93 |
nearestPump(metric = “euclidean”, case.set = “expected”) | 106 | 26 |
simulateFatalities() | 5280 | 1228 |
unstackFatalities() | 163 | 40 |
simulateWalkingDistance() | 204 | 58 |
neighborhoodWalking() | 1 logical core | 8 logical cores |
---|---|---|
plot.walking() | 5.6 | 11.6 |
plot.walking(case.set = “expected”, type = “road”) | 30 | 36 |
plot.walking(case.set = “expected”, type = “area.points”) | 30 | 36 |
plot.walking(case.set = “expected”, type = “area.polygons”) | 56 | 48 |
Note that due to its performance, parallelization is not automatically enabled on Windows for neighborhoodWalking(). If you want to use it, you need to set dev.mode = TRUE.
neighborhoodEuclidean() | 1 logical core | 8 logical cores |
---|---|---|
plot.euclidean() | 4.2 | 3.8 |
plot.euclidean(case.set = “expected”, type = “road”) | 108 | 32 |
plot.euclidean(case.set = “expected”, type = “area.points”) | 107 | 31 |
plot.euclidean(case.set = “expected”, type = “area.polygons”) | 124 | 48 |
function | 1 logical core | 8 logical cores |
---|---|---|
nearestPump() | 3.6 | 9.8 |
nearestPump(metric = “euclidean”) | 3.8 | 3.4 |
nearestPump(case.set = “expected”) | 345 | 94 |
nearestPump(metric = “euclidean”, case.set = “expected”) | 106 | 29 |
simulateFatalities() | 5094 | 1268 |
unstackFatalities() | 163 | 50 |
simulateWalkingDistance() | 200 | 72 |
Note that due to its performance, parallelization is not automatically enabled on Windows for nearestPump(metric = “walking”, case.set = “observed”). If you want to use it, you need to set dev.mode = TRUE.
My understanding is that due to greater overhead, mclapply() generally outperforms parLapply(). In terms of writing code, I’ve found that even when applied to finely grained tasks (smaller chunks of code) I was more likely to see benefits from using mclapply() than when using parLapply(). With the latter, I found that you’re actually more easily penalized: there will be jobs that take longer to run in parallel than in serial.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.