The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Require approach,
comparing pak and renvRequire is a single package that combines features of
base::install.packages, base::library,
base::require, as well as pak::pkg_install,
remotes::install_github, and
versions::install_version, plus the snapshotting
capabilities of renv. It takes its name from the idea that
a user could simply have one line named from the require
function that would load a package, but in this case it will also
install the package if necessary. Set it and forget it. This means that
even if a user has a dependency that is removed from CRAN (“archived”),
the line will still work. Because it can be done in one line, it becomes
relatively easy to share, which facilitates, for example, making
reprexes for debugging. This package can be a key part of a reproducible
workflow.
RequireRequire is designed with features that facilitate
running R code that is part of a continuous reproducible workflow, from
data-to-decisions. For this to work, all functions called by a user
should have a property whereby the initial time they are called does the
heavy work, and the subsequent times are sufficiently fast that the user
is not forced to skip over lines of code when re-running code. This is
called “rerun-tolerance” or “idempotency”, i.e., the line can be rerun
under identical conditions and very quickly return the original result.
The package, reproducible, has a function
Cache which can convert many function calls to have this
property. It does not work well for functions whose objectives are
side-effects, like installing and loading packages. Require
fills this gap.
Three rules describe Require’s behaviour completely:
Therefore, Require uses statement about version
as the top level priority. Any request to install a package without a
version statement will only install a package if it is not installed.
Otherwise, it will install nothing. Examples:
Require::Require("data.table") # installs if missing, otherwise calls require
The next line installs data.table if missing, otherwise
checks the locally installed version, installs update if needed to
satisfy version statement, then calls require:
Require::Require("data.table (>=1.18.0)")
This version priority behaviour matches the default
install.packages behaviour in base R, when a package
declares a version dependency. Require extends this to a
user-specified statement.
See below for more detailed examples.
When there are apparent package conflicts, Require uses,
in this order:
See these examples:
# No version specifications — CRAN version installed, or nothing if already installed
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible"))
# `HEAD` after the GitHub ref forces the tip of the development branch
Require::Install(c("PredictiveEcology/reproducible@development (HEAD)", "reproducible"))
# Same: `HEAD` after the package name (of either form) forces the tip
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible (HEAD)"))
# No conflict: version requirement is satisfiable by the named branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
"PredictiveEcology/reproducible (>= 2.0.10)"))
# Even if a branch doesn't exist, no error if a later requirement names a different branch
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>= 2.0.10.9010)",
"PredictiveEcology/reproducible@validityTest (>= 2.0.9)"))The internal package dependency algorithm and package installation
mechanism now uses pak for both instead of a custom package
dependency function plus install.packages. This allows a
user to mix and match pak based manual installs with
Require-based code. I highlight below the differences
between using pak and Require, with these new
default internals. The old, native Require approach still
works, if the user desires to use it:
options(Require.usePak = FALSE).
usePak = TRUE)Features include:
pak).Require::Require.==3.5.0 or
>=3.5.0).To be functionally reproducible, code must be regularly run and tested on many operating systems and computers. When this does not happen, a user/developer does not know that certain code chunks no longer work until they try to run it later. In other words, code gets stale because underlying algorithms and data change. To be rerun-tolerant, a function must:
Require does both of these. See below “why is it
fast”.
It is common during code development to work in teams, and to be updating package code. This is beneficial whether the team is very tight, all working on exactly the same project, or looser where they only share certain components across diverse projects.
If the whole team is working on the same “whole” project, then it may
be useful to use a “package snapshot” approach, as is used with the
renv package. Require offers similar
functionality with the function pkgSnapshot(). Using this
approach provides a mechanism for each team member to update code, then
snapshot the project, commit the snapshot and push to the cloud for the
team to share.
However, if a team is more diversified and they are actually sharing
the new code, but not the whole project, then project snapshots will be
very inefficient and package management must be on a package-by-package
case, not the whole project. In other words, the code developer can work
on their package, and the various team members will have 2 options of
what they might want to do: keep at the bleeding edge or update only if
necessary for dependencies. More likely, they will want to have a
mixture of these strategies, i.e., bleeding edge with some code, but
only if necessary with others. Thus, Require offers
programmatic control for this. For example
Require::Install(
c("PredictiveEcology/reproducible@development (HEAD)",
"PredictiveEcology/SpaDES.core@development (>=2.0.5.9004)"))will keep the project at the bleeding edge of the development branch
of reproducible, but will only update if necessary (based
on the version needed, expressed by the inequality) for the development
branch of SpaDES.core. The user does not have to make
decisions at run time as to whether an update should be made, and for
which packages.
pak and RequireRequire differs from pak in
philosophyBy default, as of version 2.0.0, Require uses
pak to calculate package dependency trees and
installations. However, Require applies a different
philosophy to package management. The two tools answer the same question
— “what should be installed?” — in different ways.
Require is therefore not an alternative to
pak. It is a complementary wrapper that applies a
different policy on top of pak. The differences described
in this vignette are differences in policy, not in
installation machinery. For example: when you call
pak::pkg_install("data.table"), pak will offer
to upgrade data.table if a newer version is on CRAN. When
you call Require::Install("data.table"),
Require first checks whether the installed version already
satisfies your request; if it does, nothing happens at all. The actual
install, when one is needed, is done by pak either way.
Practically speaking, this means that a user can write their list of packages they need in their code, and leave it there, without concern that their packages may unexpectedly be updated – using time and possibly changing functionality at an inopportune moment.
The biggest difference is what each tool does when a package is already installed.
pak is current-first. If you ask
pak to install a package that is already there, it will
check for a newer version and offer to upgrade.Require is stability-first. If the
installed version satisfies your request, Require does not
install. It will only install or upgrade when the version constraint you
wrote actually requires it.This is what makes Require “set-and-forget”. You can put
a Require::Install(...) line near the top of a script, run
that script every day for a year, and your packages will not silently
change underneath you. They only change when you change your version
requirement.
| The package state | pak::pkg_install("data.table") |
Require::Install("data.table") |
|---|---|---|
| Not installed | Installs latest | Installs latest |
| Installed, latest | No change | No change |
| Installed, but newer on CRAN | Asks user whether to upgrade | No change |
Installed, version < (>= X) |
User cannot specify in this way | Upgrades to satisfy |
Require exposes the upgrade policy through the version
constraints in your code. If you want the latest, ask for it
(e.g. data.table (>= 1.16) or
data.table (HEAD)); if you want stability, leave the
constraint off.
The same stability-first policy shows up clearly when you install
from a GitHub branch. pak reads the
DESCRIPTION on that branch and enforces every dependency
exactly as written — even with upgrade = FALSE, it
will downgrade (or upgrade) installed dependencies to match the pin.
Require reads the same DESCRIPTION but treats
each line as a minimum: if an installed dependency already
satisfies the constraint, it is left alone.
Here LandR@development lists
reproducible (>= 3.0.0.9001) in its
DESCRIPTION, while the user already has
reproducible 3.0.0.9083 installed:
> pak::pak("PredictiveEcology/LandR@development", upgrade = FALSE)
→ Will install 1 package.
→ Will update 1 package.
→ All 2 packages (0 B) are cached.
+ LandR 1.1.5.9101 [bld][cmp] (GitHub: c5e771d)
+ reproducible 3.0.0.9083 → 3.0.0.9001 [bld][cmp] (GitHub: ffffec4)
✔ All system requirements are already installed.
? Do you want to continue (Y/n) n
Error: Aborted.
> Require::Install("PredictiveEcology/LandR@development", upgrade = FALSE)
Require/pak skipping new package dependency identification: using cache (103 packages, 0.6h old)
All requested packages are in the pak download cache; installing from cache (no metadata refresh, no network)
offline mode: installing 1 package(s) from pak cache: LandR
→ Will install 1 package.
→ The package (0 B) is cached.
+ LandR 1.1.5.9101 [bld][cmp] (GitHub: c5e771d)
ℹ No downloads are needed, 1 pkg is cached
ℹ Building LandR 1.1.5.9101
✔ Built LandR 1.1.5.9101 (24.8s)
✔ Installed LandR 1.1.5.9101 (github::PredictiveEcology/LandR@c5e771d) (37ms)
✔ 1 pkg: added 1 [25.9s]
Installed 1 packages in 26.3 secs
pak insists on replacing
reproducible 3.0.0.9083 with the exact pin from the branch
(3.0.0.9001), even though the installed version is newer.
Require keeps the newer copy because it still satisfies
LandR’s constraint. The two behaviours have different use cases:
pak’s exact-pin enforcement is what you want when you need
to reproduce the dependency graph the branch author committed to;
Require’s version-minimum policy is what you want for
“set-and-forget” scripts where any version meeting the minimum is
acceptable.
pak installs packages. To use them, you still need a
separate library() call.
Require::Require() does both: it installs (if needed)
and then loads. The whole package-management story for a script can fit
on one line:
pak accepts exact version pins via
pkg@1.2.3. It does not accept ranges like
>= or <= directly — you would have to
either pin a specific version yourself or put the constraint in a
DESCRIPTION file:
# Won't work — pak does not parse this
try(pak::pak("data.table (>= 1.8.0)"))
# What you have to write instead — pick an exact version yourself
pak::pak("data.table@1.8.0")Consistent with the version requirements that can be specified in a
package DESCRIPTION file, Require accepts the full set of
R-style constraints right in the call, mixed freely:
This matters because the constraint is what tells
Require “stop, don’t install” or “yes, please upgrade”. The
constraint is the policy.
When two of your dependencies (or sub-dependencies) point to
different sources or different branches of the same package,
pak reports a conflict and stops. The user is expected to
fix it — usually by adding any:: prefixes or removing one
of the requests.
Require resolves the conflict for you, using the
priority documented above (version requirement, then CRAN, then order
requested).
# pak: errors out — both branches of LandR are requested
try(pak::pak(c("PredictiveEcology/LandR@development",
"PredictiveEcology/LandR@main")))
# Require: takes them in order — main wins
Require::Install(c("PredictiveEcology/LandR@main",
"PredictiveEcology/LandR@development"))
# Require: takes by version requirement — development wins because it satisfies the constraint
Require::Install(c("PredictiveEcology/LandR@main",
"PredictiveEcology/LandR@development (>= 1.1.5)"))The same conflict-resolution applies to mismatches between a CRAN
package and a GitHub Remotes field deep inside someone
else’s package: Require picks something and explains why,
rather than asking you to untangle it.
When a package is removed from CRAN (“archived”), pak
cannot install it from a plain name — you need to give it the explicit
URL of the archive tarball (url::https://...). And if the
archived package is a sub-dependency of something else, even
that workaround doesn’t always help.
Require retrieves the most recent archived copy
automatically and continues. This means a workflow that worked yesterday
continues to work today, even if a CRAN package has been archived
overnight.
A snapshot is a flat list of exact pins (CRAN versions and GitHub
SHAs). On paper, that’s the easiest possible install — every version is
already chosen. In practice, handing the same list to
pak::pkg_install() runs into trouble that doesn’t apply to
a “just install the latest” workflow:
pak’s resolver
evaluates every pin together. If one pin is unsolvable (an
archived version, a sub-dep that contradicts another pin), it refuses to
install anything. Require’s snapshot installer goes
pin-by-pin with install.packages(dependencies = NA) against
a synthesized local repo, so a bad row removes one package, not all of
them.pak 404s.
Require substitutes the nearest available archived version
and reports the substitution.Repository URL.
pak’s resolver only consults options(repos),
so those rows fail to resolve. Require honours each row’s
Repository column.libPath can be missing that dep. pak errors
with dependency 'X' is not available. Require
auto-fills the missing dep from CRAN/PPM and flags it so the user can
add it to the snapshot for full reproducibility.pak does fail,
the user sees ! error in pak subprocess.
Require keeps per-package install logs and prints a
structured report: status (download-failed /
version-conflict / missing-dep /
compile-failed / cascade /
substituted / auto-filled), the reason, and a
concrete fix.libcurl multi, with PPM binaries preferred
(and the right User-Agent set so PPM actually serves
binaries). The cache is pkgcache — the same cache
pak uses — so anything downloaded here is reusable by
pak next time, and vice versa.pakIt is tempting to assume pak’s own cache would handle a
snapshot install end-to-end — hand pak a list of
cran::pkg@version refs and let pkgcache
deduplicate. We measured this directly. Result, on a 379-package
snapshot:
| Strategy | Cache-warm time | Outcome |
|---|---|---|
Require snapshot installer (local:: source
+ install.packages(type = "binary")) |
~60 s | All pins installed at the snapshot’s exact version |
pak::pkg_install(c("cran::pkgA@verA", "cran::pkgB@verB", …)) |
~1240 s (≈18×) | All packages eventually installed, but several pins bumped away from the snapshot version (forced source recompile) |
The reason is structural, not a bug in pak. CRAN only
builds binaries for the current version of each
package; older versions live in src/contrib/Archive/ as
source only. So when pak’s resolver sees
pkgA@<archived-version>, it constructs the
source-Archive URL — it never tries a binary URL, because no binary URL
exists. Even if a binary for that exact pin is sitting in
pkgcache (because we built it on a previous run),
pak rebuilds from source. Snapshot installs are dominated
by archived-version pins, so this is the common case, not the edge
case.
Require’s snapshot installer works around this with two
mechanisms pak does not expose:
| Layer | What Require does |
What pak alone would do |
|---|---|---|
| Cached binary for an archived pin | install.packages(type = "binary", repos = NULL)
against the cached .tgz/.zip (skips compile
entirely) |
Rebuild from source archive (no binary URL exists for non-current versions) |
| Source tarball for an archived pin | pak::pkg_install("local::<file>") —
bypasses pak’s resolver, installs the on-disk file
directly |
Re-download from src/contrib/Archive/...
even if the file is in pkgcache under a different URL
key |
Binary local:: ref |
n/a — Require routes binaries through
install.packages instead |
Refuses with “Platform mismatch” — pak’s
local:: is source-only |
GitHub @SHA pin |
Built once, then cached as a binary tarball under a
synthetic require-snapshot-bin:// URL so subsequent runs
unpack instead of rebuilding |
Rebuild on every run (the synthetic URL key is not part
of pak’s resolver vocabulary) |
| Bump-and-retry for a pin that won’t compile | Walks the CRAN Archive listing for newer versions, tries each, records the substitution in the diagnostic report | All-or-nothing: one unsolvable pin aborts the whole install |
So Require is using pak for everything
pak is good at — parallel downloads, the install
subprocess, the pkgcache cache layout — and adding the
orchestration layer that turns “snapshot of mostly-archived pinned
versions” into a workflow that actually finishes in a minute instead of
twenty.
Snapshot installs are the default path of
Require::Install() when an
inst/snapshot.txt-style file is supplied; the behaviour
above is what makes “snapshot from one machine, restore on another a
year later” actually work.
Require can install and load packages
with no internet, as long as they (or compatible builds of them) were
downloaded once before. This is useful in some settings, including e.g.,
a high performance computer cluster that has no internet access on the
compute nodes. Set:
Require looks for each package in the local
pak cache and lets pak install from there.
With a warm cache, installs are near-instant — no rebuild, no
download.
Two things make this work that calling pak directly does
not:
pak
normally fetches bioconductor.org/config.yaml and refreshes
its repo metadata at startup, even when nothing remote is needed.
Require sets the right environment variables so those calls
are skipped for the duration of the install.Require’s
internal pkg (>= X.Y.Z) constraint form is rewritten to
the bare package name before pak sees it (pak
rejects the parenthetical form).You don’t have to set Require.offlineMode yourself. If
Require tries an online install and any package fails
because the network is unreachable, it probes for connectivity (~2
seconds) and, if there really is no internet, automatically retries from
the cache. On the happy path you pay nothing extra; the probe only runs
when an install fails.
When a package isn’t in the cache at all, Require warns
“not in pak cache” — a separate message from “tarball was in cache but
install failed”, so the cause is unambiguous.
| What | Require |
pak (called directly) |
|---|---|---|
| Installs an already-installed package | Only if version constraint demands it | Will offer to upgrade if a newer version exists |
| Loads packages after install | Yes (Require()) |
No, install only |
| Version constraints in package name | Pkg (>= X), (== X),
(<= X), (HEAD) |
Exact pin only via Pkg@X |
| Multiple branches/sources for same package | Resolves by priority | Errors as a conflict |
| Archived CRAN package (direct) | Automatic | Needs explicit url::... |
| Archived CRAN package (as a dependency) | Automatic | Often fails even with workarounds |
Additional_repositories in
DESCRIPTION |
Honoured | Not honoured |
| User-controlled override per package | (HEAD) to force latest |
Not exposed |
| Snapshot creation | pkgSnapshot() /
pkgSnapshot2() |
None (use renv separately) |
| Snapshot install (per-row tolerant) | Yes — bad row removed, rest installs | No — one unsolvable pin aborts the whole install |
| Substitute archived version when pin is gone | Yes (nearest available) | No (fails) |
Honour snapshot row’s Repository column |
Yes | No (only options(repos)) |
| Auto-fill missing transitive deps in snapshot | Yes, with diagnostic | No (errors) |
| Per-package failure diagnostic | Status / reason / fix per package | ! error in pak subprocess |
The “installation engine” rows that used to appear here (parallel
downloads, parallel installs, local cache) are no longer differences:
Require uses pak for those.
Because a major objective for Require is to be set it
and forget it, it cannot use meaningful human time. Thus, when all
packages are installed, rerunning Require lines is 2x-10x
faster than the equivalent pak::pak line:
> system.time(pak::pak(c("devtools", "testthat", "roxygen2")))
ℹ No downloads are needed
✔ 3 pkgs + 90 deps: kept 92 [1.4s]
user system elapsed
0.00 0.00 1.47
> system.time(Require::Install(c("devtools", "testthat", "roxygen2")))
Require/pak skipping new package dependency identification: using memory cache (93 packages)
No packages to install/update
user system elapsed
0.04 0.00 0.20
pak is already fast due to parallel downloads and
package caching. Require adds a few other features for
speed.
RequireIf the packages supplied to a Require/Install call are
identical as a previous one (commonly the case for ongoing projects),
the package dependency tree is not re-calculated as it is stored on disk
and in memory (so in-session re-runs are very fast). Since this is a
slow process for >200 packages, users will see near instant package
assessments.
renv and Requirerenv has a concept of a lockfile. This lockfile records
a specific version of a package. If the current installed version of a
package is different from the lockfile (e.g., I am the developer and I
increment the local version), renv will attempt to revert
the local changes (with prompt to confirm) unless the local
package is installed from a cloud repository (e.g., GitHub), and a
snapshot is taken. This sequence is largely incompatible
with pkgload::load_all() or
devtools::install(), as these do not record “where” to get
the current version from. Thus, the renv sequence can be
quite time consuming (1-2 minutes, instead of 1 second with
pkgload::load_all()).
Require does not attempt to update anything unless
required by a package. Thus, this issue never comes up. If and when it
is important to “snapshot”, then pkgSnapshot or
pkgSnapshot2 can be used.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.