Chapter 12: The nmathopencl R API — Distribution Functions on the GPU

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Kjell Nygren

2026-06-11

Overview

nmathopencl exports a family of R functions that mirror R’s stats:: distribution functions but dispatch computation to an OpenCL device when one is available. The calling convention is intentionally close to the base-R equivalents so that substitution is straightforward.

All exported functions share three common arguments:

Argument	Type	Default	Meaning
`fallback`	logical	`FALSE`	If `TRUE` while `nmathopencl_has_opencl()` is `TRUE`, fall back to the `stats::` analogue when the OpenCL call fails; ignored when OpenCL is unavailable (CPU is used anyway)
`verbose`	logical	`FALSE`	Extra diagnostics (including when picking the CPU path with no OpenCL runtime)
`log` / `lower.tail`	logical	distribution-specific	Mirror the `stats::` convention for the same function

Checking availability

library(nmathopencl)

# TRUE if this nmathopencl build was compiled with USE_OPENCL
nmathopencl_has_opencl()

# GPU device names on the host (opencltools --- system inventory)
opencltools::gpu_names()

If OpenCL is not available (nmathopencl_has_opencl() returns FALSE), calls use the CPU equivalent unconditionally so examples and CI keep working.

By default fallback = FALSE: when OpenCL is available but a dispatch fails, the error propagates—useful while debugging kernels. Pass fallback = TRUE only when you deliberately want GPU failures masked with the CPU analogue.

Normal distribution

x  <- rnorm(1e6)

# Density
d  <- dnorm_opencl(x, mean = 0, sd = 1, log = FALSE)

# CDF --- same core arguments as stats::pnorm(q, mean, sd, lower.tail, log.p);
#          plus opencl_parallel, fallback, verbose. Long outputs use recycling length.
p <- pnorm_opencl(
  rep(1.96, 1e6),
  mean = 0,
  sd = 1,
  lower.tail = TRUE,
  log.p = FALSE
)

# Quantile (`qnorm_opencl` retains leading `n` + scalar `p`, unlike `stats::qnorm`-style vectors)
q  <- qnorm_opencl(n = 1e6, p = 0.975, mean = 0, sd = 1)

# Random draws
r  <- rnorm_opencl(n = 1e6, mean = 0, sd = 1)

Note the signature difference between dnorm_opencl (takes a vector x) and qnorm_opencl / rnorm_opencl, which retain a leading n argument for replicate-many quantiles (qnorm) or draws (rnorm). For the CDF, pnorm_opencl parallels stats::pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) and adds only opencl_parallel, fallback, and verbose. Output length is the usual recycled length (max(length(q), ...), as in stats). On GPU (when used), each output element runs pnorm_kernel once in sequence (high overhead until a batched kernel is added).

Distribution families

Gamma

dgamma_opencl(n, x, shape = 2, scale = 1)
pgamma_opencl(q, shape = 2, rate = 1)
rgamma_opencl(n, shape = 2, scale = 1)
# qgamma: use stats::qgamma(); the OpenCL kernel is internal-only
# (device compile failure, see inst/OPENCL_KERNEL_KNOWN_FAILURES.md)

Binomial

dbinom_opencl(x, size = 10, prob = 0.3)
pbinom_opencl(q, size = 10, prob = 0.3)
qbinom_opencl(p, size = 10, prob = 0.3)
rbinom_opencl(n, size = 10, prob = 0.3)

Poisson

dpois_opencl(x, lambda = 3)
ppois_opencl(q, lambda = 3)
qpois_opencl(p, lambda = 3)
rpois_opencl(n, lambda = 3)

Beta

dbeta_opencl(x, shape1 = 2, shape2 = 5)
pbeta_opencl(q, shape1 = 2, shape2 = 5)
rbeta_opencl(n, shape1 = 2, shape2 = 5)
# qbeta: use stats::qbeta(); the OpenCL kernel is internal-only
# (device link failure, see inst/OPENCL_KERNEL_KNOWN_FAILURES.md)

Additional families

All families below follow the same four-function pattern (d*, p*, q*, r*) with the same fallback / verbose arguments:

Family	R file
Cauchy	`cauchy_opencl.R`
Chi-squared	`chisq_opencl.R`
Exponential	`exponential_opencl.R`
F	`f_opencl.R`
Geometric	`geometric_opencl.R`
Hypergeometric	`hypergeometric_opencl.R`
Log-normal	`lnorm_opencl.R`
Logistic	`logistic_opencl.R`
Negative binomial	`negative_binomial_opencl.R`
t	`t_opencl.R`
Uniform	`uniform_opencl.R`
Weibull	`weibull_opencl.R`
Multivariate discrete	`multinomial_opencl.R`
Tukey	`tukey_opencl.R`

The Wilcoxon rank-sum (rwilcox_opencl.R), Wilcoxon signed-rank (signrank_opencl.R), and Bessel (bessel_opencl.R) wrappers are internal-only: their OpenCL ports have documented allocation/runtime failures (see inst/OPENCL_KERNEL_KNOWN_FAILURES.md and the notes in inst/cl/). Use the stats/base equivalents instead.

Noncentral distributions

Noncentral variants (dnt_opencl, dnchisq_opencl, dnf_opencl, dnbeta_opencl, and their p*, q* counterparts) are available with the noncentrality parameter ncp.

Special functions and math support

# Log-gamma
lgammafn_opencl(n, x)
lgammafn_sign_opencl(n, x)

# Gamma function
gammafn_opencl(n, x)

# Log-gamma at x+1
lgamma1p_opencl(n, x)

# Digamma / polygamma
digamma_opencl(n, x)
trigamma_opencl(n, x)
psigamma_opencl(n, x, deriv)

# Beta and log-beta
beta_special_opencl(n, a, b)
lbeta_special_opencl(n, a, b)

# Binomial coefficients
choose_special_opencl(n_out, n, k)
lchoose_special_opencl(n_out, n, k)

# Logspace arithmetic
logspace_add_opencl(n, logx, logy)
logspace_sub_opencl(n, logx, logy)
logspace_sum_opencl(n, logx, logy)

# Math support
fmax2_opencl(n, x, y)
fmin2_opencl(n, x, y)

These are exported via special_opencl.R and math_support_opencl.R.

R extension utilities (`R_ext`)

# R-compatible power
r_pow_opencl(n, x, y)
r_pow_di_opencl(n, x, i)

# Miscellaneous math helpers
log1pmx_opencl(n, x)
log1pexp_opencl(n, x)
log1mexp_opencl(n, x)
pow1p_opencl(n, x, y)

These are exported via rext_utils_opencl.R.

RNG core

# Base RNG draws
norm_rand_opencl(n)
unif_rand_opencl(n)
exp_rand_opencl(n)
r_unif_index_opencl(n, dt)

These generate random samples using device-side implementations of the same Kinderman-Ramage / Ahrens-Dieter algorithms used by R’s internal RNG. They are exported via rng_core_opencl.R.

Fallback behavior in detail

nmathopencl_has_opencl() is FALSE (no runtime device / not compiled with OpenCL): the wrapper always uses the CPU stats::/base analogue. The fallback argument does not gate this; it is ignored for this branch.
nmathopencl_has_opencl() is TRUE, but OpenCL enqueue/compile/bind fails inside the .Call: the outer R shim catches the failure. fallback = FALSE (default): propagate the error. fallback = TRUE: return the CPU analogue instead (optional masking for fragile stacks).

Runnable examples under inst/examples/ mostly use fallback = FALSE so R CMD check surfaces real OpenCL breakage when a device is present.

# Default --- surface GPU/build errors during development / local check with OpenCL enabled:
dnorm_opencl(x)

# Permit CPU masking while OpenCL is flaky (not the development default):
dnorm_opencl(x, fallback = TRUE, verbose = TRUE)

Performance notes

GPU acceleration is most beneficial for large vectors (>= ~10,000 elements) and for functions with non-trivial per-element computation (e.g., pgamma, pt, qnorm). For very small inputs the kernel compilation overhead and data transfer latency dominate.

The first call in an R session may be slightly slower because OpenCL JIT- compiles the kernel source for the specific device. Subsequent calls with the same kernel reuse the compiled program from the driver’s cache.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.

Chapter 12: The nmathopencl R API — Distribution Functions on the GPU

Kjell Nygren

2026-06-11

Overview

Checking availability

Normal distribution

Distribution families

Gamma

Binomial

Poisson

Beta

Additional families

Noncentral distributions

Special functions and math support

R extension utilities (R_ext)

RNG core

Fallback behavior in detail

Performance notes

R extension utilities (`R_ext`)