The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

localLLM 1.2.0

Major Changes

Backend Upgrade: llama.cpp b5421 → b7825

Core Architecture Migration: KV Cache → Unified Memory API

Breaking changes in backend (transparent to R users): - Migrated from llama_kv_self_* API to llama_memory_* API - Supports heterogeneous model architectures: - Standard Transformers (LLaMA, Qwen, Mistral, etc.) - Mamba/RWKV (State Space Models) - Hybrid models (Jamba, LFM2) - Sliding Window Attention (Qwen2-MLA)

Key improvements: - Better memory management and automatic defragmentation - Enhanced support for parallel inference with shared prefixes - Improved reproducibility of generation results - More efficient batch processing

Batch API Modernization

Improvements

Memory Management

Error Handling

Performance

API Compatibility

No changes to R-level API - All existing R code continues to work without modification:

library(localLLM)

backend_init()
model <- model_load("model.gguf")
ctx <- context_create(model, n_ctx = 512)
result <- generate(ctx, "Hello", max_tokens = 10)
# All existing code works exactly the same

Backend Library Changes

Compilation

File Modifications

Updated files: - custom_files/localllm_capi.cpp (10 locations modified) - Memory API migration (8 locations) - Batch API modernization (2 locations) - Error handling improvements - Thread configuration updates

Unchanged: - custom_files/localllm_capi.h (C API interface) - All R layer code (R/*.R) - Proxy layer (src/proxy.cpp) - Test suite (tests/testthat/*.R) - Documentation

Testing

Installation Notes

First-time Installation

install.packages("localLLM_1.2.0.tar.gz", repos = NULL, type = "source")
library(localLLM)
install_localLLM()  # Will download the new b7825 backend

Upgrading from 1.1.0

remove.packages("localLLM")
install.packages("localLLM_1.2.0.tar.gz", repos = NULL, type = "source")
library(localLLM)
install_localLLM(force = TRUE)  # Force reinstall backend

Documentation

New technical documentation: - UPGRADE_COMPLETE.md - Complete upgrade report - CRITICAL_CHANGES_REQUIRED.md - Detailed change checklist - MIGRATION_ANALYSIS_b5421_to_b7785.md - Full migration analysis - Architecture deep-dive in planning documents

Known Issues

Future Enhancements

Potential optimizations for future releases: - Flash Attention support for improved performance - Unified Buffer optimization for multi-sequence inference - SWA (Sliding Window Attention) for ultra-long contexts (128K+)

Contributors


For more information about llama.cpp, see: - llama.cpp releases - llama.cpp documentation

localLLM 1.1.0

Previous release notes (if any) would go here…

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.