The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

The Require approach, comparing pak and renv

Eliot McIntire

Require is a single package that combines features of base::install.packages, base::library, base::require, as well as pak::pkg_install, remotes::install_github, and versions::install_version, plus the snapshotting capabilities of renv. It takes its name from the idea that a user could simply have one line named from the require function that would load a package, but in this case it will also install the package if necessary. Set it and forget it. This means that even if a user has a dependency that is removed from CRAN (“archived”), the line will still work. Because it can be done in one line, it becomes relatively easy to share, which facilitates, for example, making reprexes for debugging. This package can be a key part of a reproducible workflow.

Principles used in Require

Require is designed with features that facilitate running R code that is part of a continuous reproducible workflow, from data-to-decisions. For this to work, all functions called by a user should have a property whereby the initial time they are called does the heavy work, and the subsequent times are sufficiently fast that the user is not forced to skip over lines of code when re-running code. This is called “rerun-tolerance” or “idempotency”, i.e., the line can be rerun under identical conditions and very quickly return the original result. The package, reproducible, has a function Cache which can convert many function calls to have this property. It does not work well for functions whose objectives are side-effects, like installing and loading packages. Require fills this gap.

How it works – Version priority

Three rules describe Require’s behaviour completely:

  1. Version-number requirements drive updates. If the installed version already satisfies the constraint, no update happens.
  2. No version requirement, package present → no install.
  3. Multiple, apparently incompatible requests for the same package don’t error.

Therefore, Require uses statement about version as the top level priority. Any request to install a package without a version statement will only install a package if it is not installed. Otherwise, it will install nothing. Examples:

Require::Require("data.table") # installs if missing, otherwise calls require

The next line installs data.table if missing, otherwise checks the locally installed version, installs update if needed to satisfy version statement, then calls require:

Require::Require("data.table (>=1.18.0)") 

This version priority behaviour matches the default install.packages behaviour in base R, when a package declares a version dependency. Require extends this to a user-specified statement.

See below for more detailed examples.

Apparent package conflicts

When there are apparent package conflicts, Require uses, in this order:

See these examples:

# No version specifications — CRAN version installed, or nothing if already installed
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible"))

# `HEAD` after the GitHub ref forces the tip of the development branch
Require::Install(c("PredictiveEcology/reproducible@development (HEAD)", "reproducible"))

# Same: `HEAD` after the package name (of either form) forces the tip
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible (HEAD)"))

# No conflict: version requirement is satisfiable by the named branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
                   "PredictiveEcology/reproducible (>= 2.0.10)"))

# Even if a branch doesn't exist, no error if a later requirement names a different branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
                   "PredictiveEcology/reproducible@validityTest (>= 2.0.9)"))

New default as of version 2.0.0

The internal package dependency algorithm and package installation mechanism now uses pak for both instead of a custom package dependency function plus install.packages. This allows a user to mix and match pak based manual installs with Require-based code. I highlight below the differences between using pak and Require, with these new default internals. The old, native Require approach still works, if the user desires to use it: options(Require.usePak = FALSE).

Key features (when usePak = TRUE)

Features include:

  1. Fast, parallel installs and downloads (delegated to pak).
  2. Installs CRAN and CRAN-alike packages even if they have been archived.
  3. Installs GitHub packages.
  4. Can loads packages after installing, if using Require::Require.
  5. User can specify which version to install using the standard R-version approach (e.g., ==3.5.0 or >=3.5.0).
  6. Local package caching (see below) for fast (re-)installs.
  7. Manages (several types of) conflicting package requests, i.e., different GitHub branches.
  8. Finds specific versions of packages from an incomplete CRAN-like repository (such as r-universe.dev), even when the version is not available, but it is available on the main CRAN mirrors.

Rerun-tolerance

To be functionally reproducible, code must be regularly run and tested on many operating systems and computers. When this does not happen, a user/developer does not know that certain code chunks no longer work until they try to run it later. In other words, code gets stale because underlying algorithms and data change. To be rerun-tolerant, a function must:

  1. return the same result or outcome every time it is run (first, second or more times later);
  2. be very fast after the first time; when it is not fast, users will skip running it “because we don’t need to run it again and it is slow”

Require does both of these. See below “why is it fast”.

Why these features help teams

It is common during code development to work in teams, and to be updating package code. This is beneficial whether the team is very tight, all working on exactly the same project, or looser where they only share certain components across diverse projects.

All working on same project

If the whole team is working on the same “whole” project, then it may be useful to use a “package snapshot” approach, as is used with the renv package. Require offers similar functionality with the function pkgSnapshot(). Using this approach provides a mechanism for each team member to update code, then snapshot the project, commit the snapshot and push to the cloud for the team to share.

Diverse projects

However, if a team is more diversified and they are actually sharing the new code, but not the whole project, then project snapshots will be very inefficient and package management must be on a package-by-package case, not the whole project. In other words, the code developer can work on their package, and the various team members will have 2 options of what they might want to do: keep at the bleeding edge or update only if necessary for dependencies. More likely, they will want to have a mixture of these strategies, i.e., bleeding edge with some code, but only if necessary with others. Thus, Require offers programmatic control for this. For example

Require::Install(
  c("PredictiveEcology/reproducible@development (HEAD)",
    "PredictiveEcology/SpaDES.core@development (>=2.0.5.9004)"))

will keep the project at the bleeding edge of the development branch of reproducible, but will only update if necessary (based on the version needed, expressed by the inequality) for the development branch of SpaDES.core. The user does not have to make decisions at run time as to whether an update should be made, and for which packages.

Differences between pak and Require

How Require differs from pak in philosophy

By default, as of version 2.0.0, Require uses pak to calculate package dependency trees and installations. However, Require applies a different philosophy to package management. The two tools answer the same question — “what should be installed?” — in different ways.

Require is therefore not an alternative to pak. It is a complementary wrapper that applies a different policy on top of pak. The differences described in this vignette are differences in policy, not in installation machinery. For example: when you call pak::pkg_install("data.table"), pak will offer to upgrade data.table if a newer version is on CRAN. When you call Require::Install("data.table"), Require first checks whether the installed version already satisfies your request; if it does, nothing happens at all. The actual install, when one is needed, is done by pak either way.

Practically speaking, this means that a user can write their list of packages they need in their code, and leave it there, without concern that their packages may unexpectedly be updated – using time and possibly changing functionality at an inopportune moment.

Stability vs. Most-recent

The biggest difference is what each tool does when a package is already installed.

This is what makes Require “set-and-forget”. You can put a Require::Install(...) line near the top of a script, run that script every day for a year, and your packages will not silently change underneath you. They only change when you change your version requirement.

The package state pak::pkg_install("data.table") Require::Install("data.table")
Not installed Installs latest Installs latest
Installed, latest No change No change
Installed, but newer on CRAN Asks user whether to upgrade No change
Installed, version < (>= X) User cannot specify in this way Upgrades to satisfy

Require exposes the upgrade policy through the version constraints in your code. If you want the latest, ask for it (e.g. data.table (>= 1.16) or data.table (HEAD)); if you want stability, leave the constraint off.

GitHub branches: exact pin vs. version minimum

The same stability-first policy shows up clearly when you install from a GitHub branch. pak reads the DESCRIPTION on that branch and enforces every dependency exactly as written — even with upgrade = FALSE, it will downgrade (or upgrade) installed dependencies to match the pin. Require reads the same DESCRIPTION but treats each line as a minimum: if an installed dependency already satisfies the constraint, it is left alone.

Here LandR@development lists reproducible (>= 3.0.0.9001) in its DESCRIPTION, while the user already has reproducible 3.0.0.9083 installed:

> pak::pak("PredictiveEcology/LandR@development", upgrade = FALSE)

→ Will install 1 package.
→ Will update 1 package.
→ All 2 packages (0 B) are cached.
+ LandR                     1.1.5.9101 [bld][cmp] (GitHub: c5e771d)
+ reproducible 3.0.0.9083 → 3.0.0.9001 [bld][cmp] (GitHub: ffffec4)
✔ All system requirements are already installed.

? Do you want to continue (Y/n) n
Error: Aborted.

> Require::Install("PredictiveEcology/LandR@development", upgrade = FALSE)
Require/pak skipping new package dependency identification: using cache (103 packages, 0.6h old)
All requested packages are in the pak download cache; installing from cache (no metadata refresh, no network)
offline mode: installing 1 package(s) from pak cache: LandR

→ Will install 1 package.
→ The package (0 B) is cached.
+ LandR   1.1.5.9101 [bld][cmp] (GitHub: c5e771d)

ℹ No downloads are needed, 1 pkg is cached
ℹ Building LandR 1.1.5.9101
✔ Built LandR 1.1.5.9101 (24.8s)
✔ Installed LandR 1.1.5.9101 (github::PredictiveEcology/LandR@c5e771d) (37ms)
✔ 1 pkg: added 1 [25.9s]
Installed 1 packages in 26.3 secs

pak insists on replacing reproducible 3.0.0.9083 with the exact pin from the branch (3.0.0.9001), even though the installed version is newer. Require keeps the newer copy because it still satisfies LandR’s constraint. The two behaviours have different use cases: pak’s exact-pin enforcement is what you want when you need to reproduce the dependency graph the branch author committed to; Require’s version-minimum policy is what you want for “set-and-forget” scripts where any version meeting the minimum is acceptable.

Installs and loads in one line

pak installs packages. To use them, you still need a separate library() call.

Require::Require() does both: it installs (if needed) and then loads. The whole package-management story for a script can fit on one line:

Require(c("data.table (>= 1.16)", "lme4", "PredictiveEcology/SpaDES.core@development"))

Version constraints in the package name

pak accepts exact version pins via pkg@1.2.3. It does not accept ranges like >= or <= directly — you would have to either pin a specific version yourself or put the constraint in a DESCRIPTION file:

# Won't work — pak does not parse this
try(pak::pak("data.table (>= 1.8.0)"))

# What you have to write instead — pick an exact version yourself
pak::pak("data.table@1.8.0")

Consistent with the version requirements that can be specified in a package DESCRIPTION file, Require accepts the full set of R-style constraints right in the call, mixed freely:

Require::Install(c("data.table (>= 1.16)",
                   "stringfish (<= 0.15.8)",
                   "qs (== 0.27.3)"))

This matters because the constraint is what tells Require “stop, don’t install” or “yes, please upgrade”. The constraint is the policy.

Conflicts: resolved vs. raised as errors

When two of your dependencies (or sub-dependencies) point to different sources or different branches of the same package, pak reports a conflict and stops. The user is expected to fix it — usually by adding any:: prefixes or removing one of the requests.

Require resolves the conflict for you, using the priority documented above (version requirement, then CRAN, then order requested).

# pak: errors out — both branches of LandR are requested
try(pak::pak(c("PredictiveEcology/LandR@development",
               "PredictiveEcology/LandR@main")))

# Require: takes them in order — main wins
Require::Install(c("PredictiveEcology/LandR@main",
                   "PredictiveEcology/LandR@development"))

# Require: takes by version requirement — development wins because it satisfies the constraint
Require::Install(c("PredictiveEcology/LandR@main",
                   "PredictiveEcology/LandR@development (>= 1.1.5)"))

The same conflict-resolution applies to mismatches between a CRAN package and a GitHub Remotes field deep inside someone else’s package: Require picks something and explains why, rather than asking you to untangle it.

Archived packages: automatic vs. manual

When a package is removed from CRAN (“archived”), pak cannot install it from a plain name — you need to give it the explicit URL of the archive tarball (url::https://...). And if the archived package is a sub-dependency of something else, even that workaround doesn’t always help.

Require retrieves the most recent archived copy automatically and continues. This means a workflow that worked yesterday continues to work today, even if a CRAN package has been archived overnight.

# pak: fails — `knn` is archived
try(pak::pkg_install("knn"))

# Require: succeeds — fetches the most recent archived copy
Require::Install("knn")

Installing from a snapshot

A snapshot is a flat list of exact pins (CRAN versions and GitHub SHAs). On paper, that’s the easiest possible install — every version is already chosen. In practice, handing the same list to pak::pkg_install() runs into trouble that doesn’t apply to a “just install the latest” workflow:

Why snapshot install needs Require’s plumbing on top of pak

It is tempting to assume pak’s own cache would handle a snapshot install end-to-end — hand pak a list of cran::pkg@version refs and let pkgcache deduplicate. We measured this directly. Result, on a 379-package snapshot:

Strategy Cache-warm time Outcome
Require snapshot installer (local:: source + install.packages(type = "binary")) ~60 s All pins installed at the snapshot’s exact version
pak::pkg_install(c("cran::pkgA@verA", "cran::pkgB@verB", …)) ~1240 s (≈18×) All packages eventually installed, but several pins bumped away from the snapshot version (forced source recompile)

The reason is structural, not a bug in pak. CRAN only builds binaries for the current version of each package; older versions live in src/contrib/Archive/ as source only. So when pak’s resolver sees pkgA@<archived-version>, it constructs the source-Archive URL — it never tries a binary URL, because no binary URL exists. Even if a binary for that exact pin is sitting in pkgcache (because we built it on a previous run), pak rebuilds from source. Snapshot installs are dominated by archived-version pins, so this is the common case, not the edge case.

Require’s snapshot installer works around this with two mechanisms pak does not expose:

Layer What Require does What pak alone would do
Cached binary for an archived pin install.packages(type = "binary", repos = NULL) against the cached .tgz/.zip (skips compile entirely) Rebuild from source archive (no binary URL exists for non-current versions)
Source tarball for an archived pin pak::pkg_install("local::<file>") — bypasses pak’s resolver, installs the on-disk file directly Re-download from src/contrib/Archive/... even if the file is in pkgcache under a different URL key
Binary local:: ref n/a — Require routes binaries through install.packages instead Refuses with “Platform mismatch” — pak’s local:: is source-only
GitHub @SHA pin Built once, then cached as a binary tarball under a synthetic require-snapshot-bin:// URL so subsequent runs unpack instead of rebuilding Rebuild on every run (the synthetic URL key is not part of pak’s resolver vocabulary)
Bump-and-retry for a pin that won’t compile Walks the CRAN Archive listing for newer versions, tries each, records the substitution in the diagnostic report All-or-nothing: one unsolvable pin aborts the whole install

So Require is using pak for everything pak is good at — parallel downloads, the install subprocess, the pkgcache cache layout — and adding the orchestration layer that turns “snapshot of mostly-archived pinned versions” into a workflow that actually finishes in a minute instead of twenty.

Snapshot installs are the default path of Require::Install() when an inst/snapshot.txt-style file is supplied; the behaviour above is what makes “snapshot from one machine, restore on another a year later” actually work.

Working offline

Require can install and load packages with no internet, as long as they (or compatible builds of them) were downloaded once before. This is useful in some settings, including e.g., a high performance computer cluster that has no internet access on the compute nodes. Set:

options(Require.offlineMode = TRUE)
Require::Require("dplyr")

Require looks for each package in the local pak cache and lets pak install from there. With a warm cache, installs are near-instant — no rebuild, no download.

Two things make this work that calling pak directly does not:

You don’t have to set Require.offlineMode yourself. If Require tries an online install and any package fails because the network is unreachable, it probes for connectivity (~2 seconds) and, if there really is no internet, automatically retries from the cache. On the happy path you pay nothing extra; the probe only runs when an install fails.

When a package isn’t in the cache at all, Require warns “not in pak cache” — a separate message from “tarball was in cache but install failed”, so the cause is unambiguous.

Summary of differences

What Require pak (called directly)
Installs an already-installed package Only if version constraint demands it Will offer to upgrade if a newer version exists
Loads packages after install Yes (Require()) No, install only
Version constraints in package name Pkg (>= X), (== X), (<= X), (HEAD) Exact pin only via Pkg@X
Multiple branches/sources for same package Resolves by priority Errors as a conflict
Archived CRAN package (direct) Automatic Needs explicit url::...
Archived CRAN package (as a dependency) Automatic Often fails even with workarounds
Additional_repositories in DESCRIPTION Honoured Not honoured
User-controlled override per package (HEAD) to force latest Not exposed
Snapshot creation pkgSnapshot() / pkgSnapshot2() None (use renv separately)
Snapshot install (per-row tolerant) Yes — bad row removed, rest installs No — one unsolvable pin aborts the whole install
Substitute archived version when pin is gone Yes (nearest available) No (fails)
Honour snapshot row’s Repository column Yes No (only options(repos))
Auto-fill missing transitive deps in snapshot Yes, with diagnostic No (errors)
Per-package failure diagnostic Status / reason / fix per package ! error in pak subprocess

The “installation engine” rows that used to appear here (parallel downloads, parallel installs, local cache) are no longer differences: Require uses pak for those.

Set it and forget it speed

Because a major objective for Require is to be set it and forget it, it cannot use meaningful human time. Thus, when all packages are installed, rerunning Require lines is 2x-10x faster than the equivalent pak::pak line:

> system.time(pak::pak(c("devtools", "testthat", "roxygen2")))
                                                                             
ℹ No downloads are needed
✔ 3 pkgs + 90 deps: kept 92 [1.4s]
   user  system elapsed 
   0.00    0.00    1.47 
   
> system.time(Require::Install(c("devtools", "testthat", "roxygen2")))
Require/pak skipping new package dependency identification: using memory cache (93 packages)
No packages to install/update
   user  system elapsed 
   0.04    0.00    0.20 

Why is it fast?

pak is already fast due to parallel downloads and package caching. Require adds a few other features for speed.

Extra from Require

If the packages supplied to a Require/Install call are identical as a previous one (commonly the case for ongoing projects), the package dependency tree is not re-calculated as it is stored on disk and in memory (so in-session re-runs are very fast). Since this is a slow process for >200 packages, users will see near instant package assessments.

renv and Require

Managing projects during development

renv has a concept of a lockfile. This lockfile records a specific version of a package. If the current installed version of a package is different from the lockfile (e.g., I am the developer and I increment the local version), renv will attempt to revert the local changes (with prompt to confirm) unless the local package is installed from a cloud repository (e.g., GitHub), and a snapshot is taken. This sequence is largely incompatible with pkgload::load_all() or devtools::install(), as these do not record “where” to get the current version from. Thus, the renv sequence can be quite time consuming (1-2 minutes, instead of 1 second with pkgload::load_all()).

Require does not attempt to update anything unless required by a package. Thus, this issue never comes up. If and when it is important to “snapshot”, then pkgSnapshot or pkgSnapshot2 can be used.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.