Ziji's Homepage
Ziji's Homepage
Home
Publication
Blog Post
Project
Light
Dark
Automatic
Systems
Tetris: Efficient and Predictive KV Cache Offloading for Agentic and Reasoning Workloads
We present a predictive KV cache offloading mechainism that support ultra-long decoding phase in reasoning and agentic workloads.
Ziji Shi
,
Chaoyi Ruan
,
Penghui Qi
,
Guangxing Huang
,
Xinyi Wan
,
Min Lin
,
Jialin Li
PDF
Cite
Cite
×