Agentic AI

Tetris: Efficient and Predictive KV Cache Offloading for Agentic and Reasoning Workloads

We present a predictive KV cache offloading mechainism that support ultra-long decoding phase in reasoning and agentic workloads.