BenchmarkPerformance

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

The purpose of this Vignette is to demonstrate the performance of onnxruntime inference with nativeORT, including CoreML capabilities. This will demonstrate nativeORT is capable of running at real-time (sub-29.97fps) inferencing.

This is tested on 50 256x256 arrays on an Apple M1 machine, simulating an incoming video stream.

nativeORT CPU & CoreML

# typical RGB 256x256 image
input <- array(
  runif(1 * 3 * 256 * 256),
  dim=c(1L, 3L, 256L, 256L)
)

session <- nativeORT::ort_session(model_path,
                                  threads=0L,
                                  opt_level=99L)

times_cpu <- numeric(100)
for (i in 1:100){
  times_cpu[i] <- system.time(
    nativeORT::ort_infer_raw(session, input)
  )["elapsed"] * 1000
}

# CoreML
dir.create(path.expand("~/.nativeORT/cache"),
           recursive = TRUE, showWarnings = FALSE
           )
session <- nativeORT::ort_session(model_path,
                                  provider='coreml',
                                  cache_dir=path.expand("~/.nativeORT/cache"),
                                  threads=0L,
                                  opt_level=99L
           )

times_coreml <- numeric(100)
for (i in 1:100){
  times_coreml[i] <- system.time(
    nativeORT::ort_infer_raw(session, input)
  )["elapsed"] * 1000
}

results <- data.frame(
  run=rep(1:length(times_cpu), 2),
  provider=c(
    rep("CPU (nativeORT)", length(times_cpu)),
    rep("CoreML (nativeORT)", length(times_coreml))
  ),
  latency_ms=c(times_cpu, times_coreml)
)

ggplot(results, aes(x=run, y=latency_ms, color=provider)) +
  geom_line() + 
  geom_hline(yintercept=33.3, linetype="dashed", color="red") +
  annotate("text", x=85, y=40, label="29.97 fps threshold") +
  labs(
    title="Inference Latency Across Inference Engines",
    subtitle="YOLOv11n, 256x256 Images, Apple M1",
    x="Run",
    y="Latency (ms)"
  ) +
  theme_minimal()

Results

Notably, nativeORT can run substantially below real-time requirements. Due to optimization in the C++ bindings, the CPU and CoreML latency are near parity; however, it is of note that the CoreML runs offer better stability as they sit on dedicated hardware, whereas the CPU is subject to slowdowns when other processes hit.

CoreML does require a warmup (as noticed in the spike) but after one or two inferences, it becomes real-time performant. At a median latency of 7-8 milliseconds on Apple M1 Silicon, there is still time to run post-processing and remain under target latency.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.