The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

z - Advanced topic: Reproducible Analytical Pipelines with Nix

Introduction

Isolated environments are great to run pipelines in a safe and reproducible manner. This vignette details how to build a reproducible analytical pipeline using an environment built with Nix that contains the right version of R and packages.

An example of a reproducible analytical pipeline using Nix

Suppose that you’ve used {targets} to build a pipeline for a project and that you did so using a tailor-made Nix environment. Here is the call to rix() that you could have used to build that environment:

path_default_nix <- tempdir()

rix(
  r_ver = "4.2.2",
  r_pkgs = c("targets", "tarchetypes", "rmarkdown"),
  system_pkgs = NULL,
  git_pkgs = list(
    package_name = "housing",
    repo_url = "https://github.com/rap4all/housing/",
    commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"
  ),
  ide = "other",
  project_path = path_default_nix,
  overwrite = TRUE
)

This call to rix() generates the following default.nix file:

let
 pkgs = import (fetchTarball "https://github.com/rstats-on-nix/nixpkgs/archive/2023-02-13.tar.gz") {};
 
  rpkgs = builtins.attrValues {
    inherit (pkgs.rPackages) 
      rmarkdown
      tarchetypes
      targets;
  };
 
    housing = (pkgs.rPackages.buildRPackage {
      name = "housing";
      src = pkgs.fetchgit {
        url = "https://github.com/rap4all/housing/";
        rev = "1c860959310b80e67c41f7bbdc3e84cef00df18e";
        sha256 = "sha256-s4KGtfKQ7hL0sfDhGb4BpBpspfefBN6hf+XlslqyEn4=";
      };
      propagatedBuildInputs = builtins.attrValues {
        inherit (pkgs.rPackages) 
          dplyr
          ggplot2
          janitor
          purrr
          readxl
          rlang
          rvest
          stringr
          tidyr;
      };
    });
    
  system_packages = builtins.attrValues {
    inherit (pkgs) 
      glibcLocales
      nix
      R;
  };
  
in

pkgs.mkShell {
  LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
  LANG = "en_US.UTF-8";
   LC_ALL = "en_US.UTF-8";
   LC_TIME = "en_US.UTF-8";
   LC_MONETARY = "en_US.UTF-8";
   LC_PAPER = "en_US.UTF-8";
   LC_MEASUREMENT = "en_US.UTF-8";

  buildInputs = [ housing rpkgs  system_packages   ];
  
}

The environment that gets built from this default.nix file contains R version 4.2.2, the {targets} and {tarchetypes} packages, as well as the {housing} packages, which is a package that is hosted on GitHub only with some data and useful functions for the project. Because it is on GitHub, it gets installed using the buildRPackage function from Nix. You can use this environment to work on you project, or to launch a {targets} pipeline. This GitHub repository contains the finalized project.

On your local machine, you could execute the pipeline in the environment by running this in a terminal:

cd /absolute/path/to/housing/ && nix-shell default.nix --run "Rscript -e 'targets::tar_make()'"

If you wish to run the pipeline whenever you drop into the Nix shell, you could add a Shell-hook to the generated default.nix file:

path_default_nix <- tempdir()

rix(
  r_ver = "4.2.2",
  r_pkgs = c("targets", "tarchetypes", "rmarkdown"),
  system_pkgs = NULL,
  git_pkgs = list(
    package_name = "housing",
    repo_url = "https://github.com/rap4all/housing/",
    commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"
  ),
  ide = "other",
  shell_hook = "Rscript -e 'targets::tar_make()'",
  project_path = path_default_nix,
  overwrite = TRUE
)

Now, each time you drop into the Nix shell for that project using nix-shell, the pipeline gets automatically executed. {rix} also features a function called tar_nix_ga() that adds a GitHub Actions workflow file to make the pipeline run automatically on GitHub Actions. The GitHub repository linked above has such a file, so each time changes get pushed, the pipeline runs on GitHub Actions and the results are automatically pushed to a branch called targets-runs. See the workflow file here. This feature is very heavily inspired and adapted from the targets::github_actions() function.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.