ONNX Model Import

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

ONNX Model Import

ggmlR includes a built-in zero-dependency ONNX loader (hand-written protobuf parser in C). Load any compatible ONNX model and run inference on CPU or Vulkan GPU — no Python, no TensorFlow, no ONNX Runtime required.

1. Load and inspect a model

model <- onnx_load("path/to/model.onnx")

# Model summary (layers, ops, parameters)
onnx_summary(model)

# Input tensor info (name, shape, dtype)
onnx_inputs(model)

2. Run inference

Inputs are named R arrays in NCHW order (matching the ONNX model’s expected layout).

# Random image batch — replace with real data
input <- array(runif(1 * 3 * 224 * 224), dim = c(1L, 3L, 224L, 224L))

result <- onnx_run(model, list(input_name = input))

cat("Output shape:", paste(dim(result[[1]]), collapse = " x "), "\n")

For models with multiple inputs, pass a named list:

result <- onnx_run(model, list(
  input_ids      = array(as.integer(tokens), dim = c(1L, length(tokens))),
  attention_mask = array(1L, dim = c(1L, length(tokens)))
))

3. GPU inference

By default ggmlR tries Vulkan first and falls back to CPU automatically. To force a specific backend:

# Check what's available
if (ggml_vulkan_available()) {
  cat("Vulkan GPU ready\n")
  ggml_vulkan_status()
}

# Load with explicit device
model_gpu <- onnx_load("path/to/model.onnx", device = "vulkan")
model_cpu <- onnx_load("path/to/model.onnx", device = "cpu")

Weights are transferred to the GPU once at load time. Repeated calls to onnx_run() do not re-transfer weights.

4. Dynamic input shapes

Some models accept variable-length inputs. Override shapes at load time:

model <- onnx_load("path/to/bert.onnx",
                    input_shapes = list(input_ids = c(1L, 128L)))

5. FP16 inference

Run in half-precision for faster GPU inference:

model_fp16 <- onnx_load("path/to/model.onnx", dtype = "f16")
result <- onnx_run(model_fp16, list(input = input))

6. Supported operators

ggmlR supports 50+ ONNX operators, including:

Convolution: Conv, ConvTranspose, MaxPool, AveragePool, GlobalAveragePool
Linear: Gemm, MatMul, Linear
Activations: Relu, Sigmoid, Tanh, Gelu, HardSigmoid, Mish, Clip, Elu
Normalization: BatchNormalization, LayerNormalization, GroupNormalization
Shape ops: Reshape, Transpose, Flatten, Squeeze, Unsqueeze, Concat, Split, Slice, Gather, ScatterElements
Elementwise: Add, Sub, Mul, Div, Pow, Sqrt, Exp, Log, Abs, Neg
Reduction: ReduceMean, ReduceSum, ReduceMax
Attention: Attention (fused), MultiHeadAttention
Quantized: QLinearConv, QLinearMatMul, DynamicQuantizeLinear
Other: Cast, Pad, Resize, Dropout (identity at inference), LSTM, GRU, Einsum

Custom fused ops: RelPosBias2D (BoTNet).

7. Examples

For full working examples with real ONNX Zoo models see:

# GPU vs CPU benchmark across multiple models
# inst/examples/benchmark_onnx.R

# FP16 inference benchmark
# inst/examples/benchmark_onnx_fp16.R

# Run all supported ONNX Zoo models
# inst/examples/test_all_onnx.R

# BERT sentence similarity
# inst/examples/bert_similarity.R

8. Debugging tips

If a model fails to load or produces wrong results:

Check operator support — print the model’s op list with Python’s onnx package and compare against the table above.
Verify protobuf field numbers — the built-in parser is hand-written; an unexpected field can cause silent mis-parsing.
NaN tracing — use the eval callback for per-node inspection rather than a post-compute scan (which aliases buffers and gives false readings).
Repeated-run aliasing — ggml_backend_sched aliases intermediate buffers over weight buffers. ggmlR calls sched_alloc_and_load() before each compute to reset allocation. If you see correct results on the first run but garbage on subsequent runs, this is the cause.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.