doFuture 1.0.0 is on CRAN since March 2023. It introduces a new foreach operator %dofuture%, which makes it even easier to use foreach() to parallelize via the future ecosystem. This new operator is designed to be an alternative to the existing %dopar% operator for foreach() - an alternative that works in similar ways but better. If you already use foreach() together with futures, or plan on doing so, I recommend using %dofuture% instead of %dopar%.
Parallel ‘Digital Rain’ by Jahobr
After two-and-a-half months, future 1.19.1 is now on CRAN. As usual, there are some bug fixes and minor improvements here and there (NEWS), including things needed by the next version of furrr. For those of you who use Slurm or LSF/OpenLava as a scheduler on your high-performance compute (HPC) cluster, future::availableCores() will now do a better job respecting the CPU resources that those schedulers allocate for your R jobs.
There are new versions of future and future.apply - your friends in the parallelization business - on CRAN. These updates are mostly maintenance updates with bug fixes, some improvements, and preparations for upcoming changes. It’s been some time since I blogged about these packages, so here is the summary of the main updates this far since early 2020:
future:
values() for lists and other containers was renamed to value() to simplify the API [future 1.
No dogs were harmed while making this release
future 1.15.0 is now on CRAN, accompanied by a recent, related update of future.callr 0.5.0. The main update is a change to the Future API:
resolved() will now also launch lazy futures
Although this change does not look much to the world, I’d like to think of this as part of a young person slowly finding themselves. This change in behavior helps us in cases where we create lazy futures upfront;
A commonly asked question in the R community is:
How can I parallelize the following for-loop?
The answer almost always involves rewriting the for (...) { ... } loop into something that looks like a y <- lapply(...) call. If you can achieve that, you can parallelize it via for instance y <- future.apply::future_lapply(...) or y <- foreach::foreach() %dopar% { ... }.
For some for-loops it is straightforward to rewrite the code to make use of lapply() instead, whereas in other cases it can be a bit more complicated, especially if the for-loop updates multiple variables in each iteration.
The future package defines the Future API, which is a unified, generic, friendly API for parallel processing. The Future API follows the principle of write code once and run anywhere - the developer chooses what to parallelize and the user how and where.
The nature of a future is such that it lends itself to be used with several of the existing map-reduce frameworks already available in R. In this post, I’ll give an example of how to apply a function over a set of elements concurrently using plain sequential R, the parallel package, the future package alone, as well as future in combination of the foreach, the plyr, and the purrr packages.
doFuture 0.4.0 is available on CRAN. The doFuture package provides a universal foreach adaptor enabling any future backend to be used with the foreach() %dopar% { ... } construct. As shown below, this will allow foreach() to parallelize on not only multiple cores, multiple background R sessions, and ad-hoc clusters, but also cloud-based clusters and high performance compute (HPC) environments.
1,300+ R packages on CRAN and Bioconductor depend, directly or indirectly, on foreach for their parallel processing.