Open source repositories tagged with #kv-cache, ranked by health score.
High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.
Minimal, zero-dependency LLM inference in pure C11. CPU-first with NEON/AVX2 SIMD. Flash MoE (pread + LRU expert cache). TurboQuant 3-bit KV compression (8.9x less memory per session). 20+ GGUF quant formats. Compiles to WASM.