Publications

(2025). Tetris: Efficient and Predictive KV Cache Offloading for Agentic and Reasoning Workloads. In SOSP'25 SAA Workshop.

PDF Cite

(2025). TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks. In 54th International Conference on Parallel Processing (ICPP 2025).

PDF Cite Poster

(2024). ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks. In Proceedings of the 2024 ACM Symposium on Cloud Computing.

PDF Cite Poster

(2023). TAP: Efficient Derivation of Tensor Parallel Plans for Large Neural Networks. In ISCA'23 ASSYST Workshop.

PDF Cite Poster

(2022). Whale: Efficient Giant Model Training over Heterogeneous GPUs. In USENIX ATC'22.

PDF Cite Code Slides

(2021). Going Wider Instead of Deeper. In AAAI'22.

PDF Cite