The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

The Require approach, comparing pak and renv

Eliot McIntire

Require is a single package that combines features of base::install.packages, base::library, base::require, as well as pak::pkg_install, remotes::install_github, and versions::install_version, plus the snapshotting capabilities of renv. It takes its name from the idea that a user could simply have one line named from the require function that would load a package, but in this case it will also install the package if necessary. Set it and forget it. This means that even if a user has a dependency that is removed from CRAN (“archived”), the line will still work. Because it can be done in one line, it becomes relatively easy to share, which facilitates, for example, making reprexes for debugging. This package can be a key part of a reproducible workflow.

Principles used in `Require`

Require is designed with features that facilitate running R code that is part of a continuous reproducible workflow, from data-to-decisions. For this to work, all functions called by a user should have a property whereby the initial time they are called does the heavy work, and the subsequent times are sufficiently fast that the user is not forced to skip over lines of code when re-running code. This is called “rerun-tolerance” or “idempotency”, i.e., the line can be rerun under identical conditions and very quickly return the original result. The package, reproducible, has a function Cache which can convert many function calls to have this property. It does not work well for functions whose objectives are side-effects, like installing and loading packages. Require fills this gap.

How it works – Version priority

Three rules describe Require’s behaviour completely:

Version-number requirements drive updates. If the installed version already satisfies the constraint, no update happens.
No version requirement, package present → no install.
Multiple, apparently incompatible requests for the same package don’t error.

Therefore, Require uses statement about version as the top level priority. Any request to install a package without a version statement will only install a package if it is not installed. Otherwise, it will install nothing. Examples:

Require::Require("data.table") # installs if missing, otherwise calls require

The next line installs data.table if missing, otherwise checks the locally installed version, installs update if needed to satisfy version statement, then calls require:

Require::Require("data.table (>=1.18.0)")

This version priority behaviour matches the default install.packages behaviour in base R, when a package declares a version dependency. Require extends this to a user-specified statement.

See below for more detailed examples.

Apparent package conflicts

When there are apparent package conflicts, Require uses, in this order:

version requirement;
CRAN priority;
order requested

See these examples:

# No version specifications — CRAN version installed, or nothing if already installed
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible"))

# `HEAD` after the GitHub ref forces the tip of the development branch
Require::Install(c("PredictiveEcology/reproducible@development (HEAD)", "reproducible"))

# Same: `HEAD` after the package name (of either form) forces the tip
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible (HEAD)"))

# No conflict: version requirement is satisfiable by the named branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
                   "PredictiveEcology/reproducible (>= 2.0.10)"))

# Even if a branch doesn't exist, no error if a later requirement names a different branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
                   "PredictiveEcology/reproducible@validityTest (>= 2.0.9)"))

New default as of version 2.0.0

The internal package dependency algorithm and package installation mechanism now uses pak for both instead of a custom package dependency function plus install.packages. This allows a user to mix and match pak based manual installs with Require-based code. I highlight below the differences between using pak and Require, with these new default internals. The old, native Require approach still works, if the user desires to use it: options(Require.usePak = FALSE).

Key features (when `usePak = TRUE`)

Features include:

Fast, parallel installs and downloads (delegated to pak).
Installs CRAN and CRAN-alike packages even if they have been archived.
Installs GitHub packages.
Can loads packages after installing, if using Require::Require.
User can specify which version to install using the standard R-version approach (e.g., ==3.5.0 or >=3.5.0).
Local package caching (see below) for fast (re-)installs.
Manages (several types of) conflicting package requests, i.e., different GitHub branches.
Finds specific versions of packages from an incomplete CRAN-like repository (such as r-universe.dev), even when the version is not available, but it is available on the main CRAN mirrors.

Rerun-tolerance

To be functionally reproducible, code must be regularly run and tested on many operating systems and computers. When this does not happen, a user/developer does not know that certain code chunks no longer work until they try to run it later. In other words, code gets stale because underlying algorithms and data change. To be rerun-tolerant, a function must:

return the same result or outcome every time it is run (first, second or more times later);
be very fast after the first time; when it is not fast, users will skip running it “because we don’t need to run it again and it is slow”

Require does both of these. See below “why is it fast”.

Why these features help teams

It is common during code development to work in teams, and to be updating package code. This is beneficial whether the team is very tight, all working on exactly the same project, or looser where they only share certain components across diverse projects.

All working on same project

If the whole team is working on the same “whole” project, then it may be useful to use a “package snapshot” approach, as is used with the renv package. Require offers similar functionality with the function pkgSnapshot(). Using this approach provides a mechanism for each team member to update code, then snapshot the project, commit the snapshot and push to the cloud for the team to share.

Diverse projects

However, if a team is more diversified and they are actually sharing the new code, but not the whole project, then project snapshots will be very inefficient and package management must be on a package-by-package case, not the whole project. In other words, the code developer can work on their package, and the various team members will have 2 options of what they might want to do: keep at the bleeding edge or update only if necessary for dependencies. More likely, they will want to have a mixture of these strategies, i.e., bleeding edge with some code, but only if necessary with others. Thus, Require offers programmatic control for this. For example

Require::Install(
  c("PredictiveEcology/reproducible@development (HEAD)",
    "PredictiveEcology/SpaDES.core@development (>=2.0.5.9004)"))

will keep the project at the bleeding edge of the development branch of reproducible, but will only update if necessary (based on the version needed, expressed by the inequality) for the development branch of SpaDES.core. The user does not have to make decisions at run time as to whether an update should be made, and for which packages.

Differences between `pak` and `Require`

How `Require` differs from `pak` in philosophy

By default, as of version 2.0.0, Require uses pak to calculate package dependency trees and installations. However, Require applies a different philosophy to package management. The two tools answer the same question — “what should be installed?” — in different ways.

Require is therefore not an alternative to pak. It is a complementary wrapper that applies a different policy on top of pak. The differences described in this vignette are differences in policy, not in installation machinery. For example: when you call pak::pkg_install("data.table"), pak will offer to upgrade data.table if a newer version is on CRAN. When you call Require::Install("data.table"), Require first checks whether the installed version already satisfies your request; if it does, nothing happens at all. The actual install, when one is needed, is done by pak either way.

Practically speaking, this means that a user can write their list of packages they need in their code, and leave it there, without concern that their packages may unexpectedly be updated – using time and possibly changing functionality at an inopportune moment.

Stability vs. Most-recent

The biggest difference is what each tool does when a package is already installed.

pak is current-first. If you ask pak to install a package that is already there, it will check for a newer version and offer to upgrade.
Require is stability-first. If the installed version satisfies your request, Require does not install. It will only install or upgrade when the version constraint you wrote actually requires it.

This is what makes Require “set-and-forget”. You can put a Require::Install(...) line near the top of a script, run that script every day for a year, and your packages will not silently change underneath you. They only change when you change your version requirement.

The package state	`pak::pkg_install("data.table")`	`Require::Install("data.table")`
Not installed	Installs latest	Installs latest
Installed, latest	No change	No change
Installed, but newer on CRAN	Asks user whether to upgrade	No change
Installed, version `< (>= X)`	User cannot specify in this way	Upgrades to satisfy

Require exposes the upgrade policy through the version constraints in your code. If you want the latest, ask for it (e.g. data.table (>= 1.16) or data.table (HEAD)); if you want stability, leave the constraint off.

GitHub branches: exact pin vs. version minimum

The same stability-first policy shows up clearly when you install from a GitHub branch. pak reads the DESCRIPTION on that branch and enforces every dependency exactly as written — even with upgrade = FALSE, it will downgrade (or upgrade) installed dependencies to match the pin. Require reads the same DESCRIPTION but treats each line as a minimum: if an installed dependency already satisfies the constraint, it is left alone.

Here LandR@development lists reproducible (>= 3.0.0.9001) in its DESCRIPTION, while the user already has reproducible 3.0.0.9083 installed:

> pak::pak("PredictiveEcology/LandR@development", upgrade = FALSE)

→ Will install 1 package.
→ Will update 1 package.
→ All 2 packages (0 B) are cached.
+ LandR                     1.1.5.9101 [bld][cmp] (GitHub: c5e771d)
+ reproducible 3.0.0.9083 → 3.0.0.9001 [bld][cmp] (GitHub: ffffec4)
✔ All system requirements are already installed.

? Do you want to continue (Y/n) n
Error: Aborted.

> Require::Install("PredictiveEcology/LandR@development", upgrade = FALSE)
Require/pak skipping new package dependency identification: using cache (103 packages, 0.6h old)
All requested packages are in the pak download cache; installing from cache (no metadata refresh, no network)
offline mode: installing 1 package(s) from pak cache: LandR

→ Will install 1 package.
→ The package (0 B) is cached.
+ LandR   1.1.5.9101 [bld][cmp] (GitHub: c5e771d)

ℹ No downloads are needed, 1 pkg is cached
ℹ Building LandR 1.1.5.9101
✔ Built LandR 1.1.5.9101 (24.8s)
✔ Installed LandR 1.1.5.9101 (github::PredictiveEcology/LandR@c5e771d) (37ms)
✔ 1 pkg: added 1 [25.9s]
Installed 1 packages in 26.3 secs

pak insists on replacing reproducible 3.0.0.9083 with the exact pin from the branch (3.0.0.9001), even though the installed version is newer. Require keeps the newer copy because it still satisfies LandR’s constraint. The two behaviours have different use cases: pak’s exact-pin enforcement is what you want when you need to reproduce the dependency graph the branch author committed to; Require’s version-minimum policy is what you want for “set-and-forget” scripts where any version meeting the minimum is acceptable.

Installs and loads in one line

pak installs packages. To use them, you still need a separate library() call.

Require::Require() does both: it installs (if needed) and then loads. The whole package-management story for a script can fit on one line:

Require(c("data.table (>= 1.16)", "lme4", "PredictiveEcology/SpaDES.core@development"))

Version constraints in the package name

pak accepts exact version pins via pkg@1.2.3. It does not accept ranges like >= or <= directly — you would have to either pin a specific version yourself or put the constraint in a DESCRIPTION file:

# Won't work — pak does not parse this
try(pak::pak("data.table (>= 1.8.0)"))

# What you have to write instead — pick an exact version yourself
pak::pak("data.table@1.8.0")

Consistent with the version requirements that can be specified in a package DESCRIPTION file, Require accepts the full set of R-style constraints right in the call, mixed freely:

Require::Install(c("data.table (>= 1.16)",
                   "stringfish (<= 0.15.8)",
                   "qs (== 0.27.3)"))

This matters because the constraint is what tells Require “stop, don’t install” or “yes, please upgrade”. The constraint is the policy.

Conflicts: resolved vs. raised as errors

When two of your dependencies (or sub-dependencies) point to different sources or different branches of the same package, pak reports a conflict and stops. The user is expected to fix it — usually by adding any:: prefixes or removing one of the requests.

Require resolves the conflict for you, using the priority documented above (version requirement, then CRAN, then order requested).

# pak: errors out — both branches of LandR are requested
try(pak::pak(c("PredictiveEcology/LandR@development",
               "PredictiveEcology/LandR@main")))

# Require: takes them in order — main wins
Require::Install(c("PredictiveEcology/LandR@main",
                   "PredictiveEcology/LandR@development"))

# Require: takes by version requirement — development wins because it satisfies the constraint
Require::Install(c("PredictiveEcology/LandR@main",
                   "PredictiveEcology/LandR@development (>= 1.1.5)"))

The same conflict-resolution applies to mismatches between a CRAN package and a GitHub Remotes field deep inside someone else’s package: Require picks something and explains why, rather than asking you to untangle it.

Archived packages: automatic vs. manual

When a package is removed from CRAN (“archived”), pak cannot install it from a plain name — you need to give it the explicit URL of the archive tarball (url::https://...). And if the archived package is a sub-dependency of something else, even that workaround doesn’t always help.

Require retrieves the most recent archived copy automatically and continues. This means a workflow that worked yesterday continues to work today, even if a CRAN package has been archived overnight.

# pak: fails — `knn` is archived
try(pak::pkg_install("knn"))

# Require: succeeds — fetches the most recent archived copy
Require::Install("knn")

Installing from a snapshot

A snapshot is a flat list of exact pins (CRAN versions and GitHub SHAs). On paper, that’s the easiest possible install — every version is already chosen. In practice, handing the same list to pak::pkg_install() runs into trouble that doesn’t apply to a “just install the latest” workflow:

All-or-nothing solving. pak’s resolver evaluates every pin together. If one pin is unsolvable (an archived version, a sub-dep that contradicts another pin), it refuses to install anything. Require’s snapshot installer goes pin-by-pin with install.packages(dependencies = NA) against a synthesized local repo, so a bad row removes one package, not all of them.
Archived / disappeared versions. Snapshots routinely pin versions that have since left CRAN. pak 404s. Require substitutes the nearest available archived version and reports the substitution.
Non-CRAN homes. Rows that came from r-universe, RSPM, or another CRAN-alike carry a Repository URL. pak’s resolver only consults options(repos), so those rows fail to resolve. Require honours each row’s Repository column.
Incomplete snapshots. A snapshot built from a session that already had a transitive dep loaded from another libPath can be missing that dep. pak errors with dependency 'X' is not available. Require auto-fills the missing dep from CRAN/PPM and flags it so the user can add it to the snapshot for full reproducibility.
Opaque failures. When pak does fail, the user sees ! error in pak subprocess. Require keeps per-package install logs and prints a structured report: status (download-failed / version-conflict / missing-dep / compile-failed / cascade / substituted / auto-filled), the reason, and a concrete fix.
Speed on Linux/macOS. Tarballs are fetched in parallel via libcurl multi, with PPM binaries preferred (and the right User-Agent set so PPM actually serves binaries). The cache is pkgcache — the same cache pak uses — so anything downloaded here is reusable by pak next time, and vice versa.

Why snapshot install needs Require’s plumbing on top of `pak`

It is tempting to assume pak’s own cache would handle a snapshot install end-to-end — hand pak a list of cran::pkg@version refs and let pkgcache deduplicate. We measured this directly. Result, on a 379-package snapshot:

Strategy	Cache-warm time	Outcome
`Require` snapshot installer (`local::` source + `install.packages(type = "binary")`)	~60 s	All pins installed at the snapshot’s exact version
`pak::pkg_install(c("cran::pkgA@verA", "cran::pkgB@verB", …))`	~1240 s (≈18×)	All packages eventually installed, but several pins bumped away from the snapshot version (forced source recompile)

The reason is structural, not a bug in pak. CRAN only builds binaries for the current version of each package; older versions live in src/contrib/Archive/ as source only. So when pak’s resolver sees pkgA@<archived-version>, it constructs the source-Archive URL — it never tries a binary URL, because no binary URL exists. Even if a binary for that exact pin is sitting in pkgcache (because we built it on a previous run), pak rebuilds from source. Snapshot installs are dominated by archived-version pins, so this is the common case, not the edge case.

Require’s snapshot installer works around this with two mechanisms pak does not expose:

Layer	What `Require` does	What `pak` alone would do
Cached binary for an archived pin	`install.packages(type = "binary", repos = NULL)` against the cached `.tgz`/`.zip` (skips compile entirely)	Rebuild from source archive (no binary URL exists for non-current versions)
Source tarball for an archived pin	`pak::pkg_install("local::<file>")` — bypasses `pak`’s resolver, installs the on-disk file directly	Re-download from `src/contrib/Archive/...` even if the file is in `pkgcache` under a different URL key
Binary `local::` ref	n/a — Require routes binaries through `install.packages` instead	Refuses with “Platform mismatch” — `pak`’s `local::` is source-only
GitHub `@SHA` pin	Built once, then cached as a binary tarball under a synthetic `require-snapshot-bin://` URL so subsequent runs unpack instead of rebuilding	Rebuild on every run (the synthetic URL key is not part of `pak`’s resolver vocabulary)
Bump-and-retry for a pin that won’t compile	Walks the CRAN Archive listing for newer versions, tries each, records the substitution in the diagnostic report	All-or-nothing: one unsolvable pin aborts the whole install

So Require is using pak for everything pak is good at — parallel downloads, the install subprocess, the pkgcache cache layout — and adding the orchestration layer that turns “snapshot of mostly-archived pinned versions” into a workflow that actually finishes in a minute instead of twenty.

Snapshot installs are the default path of Require::Install() when an inst/snapshot.txt-style file is supplied; the behaviour above is what makes “snapshot from one machine, restore on another a year later” actually work.

Working offline

Require can install and load packages with no internet, as long as they (or compatible builds of them) were downloaded once before. This is useful in some settings, including e.g., a high performance computer cluster that has no internet access on the compute nodes. Set:

options(Require.offlineMode = TRUE)
Require::Require("dplyr")

Require looks for each package in the local pak cache and lets pak install from there. With a warm cache, installs are near-instant — no rebuild, no download.

Two things make this work that calling pak directly does not:

Network probes are suppressed. pak normally fetches bioconductor.org/config.yaml and refreshes its repo metadata at startup, even when nothing remote is needed. Require sets the right environment variables so those calls are skipped for the duration of the install.
Refs are translated. Require’s internal pkg (>= X.Y.Z) constraint form is rewritten to the bare package name before pak sees it (pak rejects the parenthetical form).

You don’t have to set Require.offlineMode yourself. If Require tries an online install and any package fails because the network is unreachable, it probes for connectivity (~2 seconds) and, if there really is no internet, automatically retries from the cache. On the happy path you pay nothing extra; the probe only runs when an install fails.

When a package isn’t in the cache at all, Require warns “not in pak cache” — a separate message from “tarball was in cache but install failed”, so the cause is unambiguous.

Summary of differences

What	`Require`	`pak` (called directly)
Installs an already-installed package	Only if version constraint demands it	Will offer to upgrade if a newer version exists
Loads packages after install	Yes (`Require()`)	No, install only
Version constraints in package name	`Pkg (>= X)`, `(== X)`, `(<= X)`, `(HEAD)`	Exact pin only via `Pkg@X`
Multiple branches/sources for same package	Resolves by priority	Errors as a conflict
Archived CRAN package (direct)	Automatic	Needs explicit `url::...`
Archived CRAN package (as a dependency)	Automatic	Often fails even with workarounds
`Additional_repositories` in `DESCRIPTION`	Honoured	Not honoured
User-controlled override per package	`(HEAD)` to force latest	Not exposed
Snapshot creation	`pkgSnapshot()` / `pkgSnapshot2()`	None (use `renv` separately)
Snapshot install (per-row tolerant)	Yes — bad row removed, rest installs	No — one unsolvable pin aborts the whole install
Substitute archived version when pin is gone	Yes (nearest available)	No (fails)
Honour snapshot row’s `Repository` column	Yes	No (only `options(repos)`)
Auto-fill missing transitive deps in snapshot	Yes, with diagnostic	No (errors)
Per-package failure diagnostic	Status / reason / fix per package	`! error in pak subprocess`

The “installation engine” rows that used to appear here (parallel downloads, parallel installs, local cache) are no longer differences: Require uses pak for those.

Set it and forget it speed

Because a major objective for Require is to be set it and forget it, it cannot use meaningful human time. Thus, when all packages are installed, rerunning Require lines is 2x-10x faster than the equivalent pak::pak line:

> system.time(pak::pak(c("devtools", "testthat", "roxygen2")))
                                                                             
ℹ No downloads are needed
✔ 3 pkgs + 90 deps: kept 92 [1.4s]
   user  system elapsed 
   0.00    0.00    1.47 
   
> system.time(Require::Install(c("devtools", "testthat", "roxygen2")))
Require/pak skipping new package dependency identification: using memory cache (93 packages)
No packages to install/update
   user  system elapsed 
   0.04    0.00    0.20

Why is it fast?

pak is already fast due to parallel downloads and package caching. Require adds a few other features for speed.

Extra from `Require`

If the packages supplied to a Require/Install call are identical as a previous one (commonly the case for ongoing projects), the package dependency tree is not re-calculated as it is stored on disk and in memory (so in-session re-runs are very fast). Since this is a slow process for >200 packages, users will see near instant package assessments.

`renv` and `Require`

Managing projects during development

renv has a concept of a lockfile. This lockfile records a specific version of a package. If the current installed version of a package is different from the lockfile (e.g., I am the developer and I increment the local version), renv will attempt to revert the local changes (with prompt to confirm) unless the local package is installed from a cloud repository (e.g., GitHub), and a snapshot is taken. This sequence is largely incompatible with pkgload::load_all() or devtools::install(), as these do not record “where” to get the current version from. Thus, the renv sequence can be quite time consuming (1-2 minutes, instead of 1 second with pkgload::load_all()).

Require does not attempt to update anything unless required by a package. Thus, this issue never comes up. If and when it is important to “snapshot”, then pkgSnapshot or pkgSnapshot2 can be used.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.

The Require approach, comparing pak and renv