The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Breaking changes in backend (transparent to R
users): - Migrated from llama_kv_self_* API to
llama_memory_* API - Supports heterogeneous model
architectures: - Standard Transformers (LLaMA, Qwen, Mistral, etc.) -
Mamba/RWKV (State Space Models) - Hybrid models (Jamba, LFM2) - Sliding
Window Attention (Qwen2-MLA)
Key improvements: - Better memory management and automatic defragmentation - Enhanced support for parallel inference with shared prefixes - Improved reproducibility of generation results - More efficient batch processing
llama_batch_get_one()llama_batch_init() +
common_batch_add() + llama_batch_free()generate() call starts from clean staten_threads_batch parameter for batch processingNo changes to R-level API - All existing R code continues to work without modification:
library(localLLM)
backend_init()
model <- model_load("model.gguf")
ctx <- context_create(model, n_ctx = 512)
result <- generate(ctx, "Hello", max_tokens = 10)
# All existing code works exactly the samebackend/llama.cpp/build_localllm.shUpdated files: -
custom_files/localllm_capi.cpp (10 locations modified) -
Memory API migration (8 locations) - Batch API modernization (2
locations) - Error handling improvements - Thread configuration
updates
Unchanged: -
custom_files/localllm_capi.h (C API interface) - All R
layer code (R/*.R) - Proxy layer
(src/proxy.cpp) - Test suite
(tests/testthat/*.R) - Documentation
install.packages("localLLM_1.2.0.tar.gz", repos = NULL, type = "source")
library(localLLM)
install_localLLM() # Will download the new b7825 backendremove.packages("localLLM")
install.packages("localLLM_1.2.0.tar.gz", repos = NULL, type = "source")
library(localLLM)
install_localLLM(force = TRUE) # Force reinstall backendNew technical documentation: - UPGRADE_COMPLETE.md -
Complete upgrade report - CRITICAL_CHANGES_REQUIRED.md -
Detailed change checklist -
MIGRATION_ANALYSIS_b5421_to_b7785.md - Full migration
analysis - Architecture deep-dive in planning documents
Potential optimizations for future releases: - Flash Attention support for improved performance - Unified Buffer optimization for multi-sequence inference - SWA (Sliding Window Attention) for ultra-long contexts (128K+)
For more information about llama.cpp, see: - llama.cpp releases - llama.cpp documentation
Previous release notes (if any) would go here…
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.