Ziji SHI(史子骥)

Ph.D. candidate

National University of Singapore

Biography

I am a fourth year Ph.D. student at the National University of Singapore, advised by Prof. Jialin Li and Wei Lin. My primary research interest is in developing highly efficient distributed systems for machine learning. I have previously interned at Google XLA, Alibaba Platform for AI (PAI) team, and Apple Turi.

During my undergraduate studies, I had the privilege (and fun) of spending four years with the NTU HPC club, where we won the Overall Championship at SC'17 and set the LINPACK World Record. I am now hosting a MLSys seminar.

Outside of my academic pursuits, I enjoy cooking (menu), jogging, and skateboarding. And my Erdős number is 5.

Download my resumé.

Interests

Large-Scale Machine Learning
High-Performance Computing
Distributed Systems

Education

Doctor of Philosophy, Computer Science, 2021 - Current
National University of Singapore
Bachelor of Engineering in Computer Science, 2015 - 2019
Nanyang Technological University
Visiting Student, Fall 2016
New York University

Publications

Ziji Shi, Le Jiang, Jie Zhang, Xianyan Jia, Yong Li, Chencan Wu, Jialin Li, Wei Lin

June 2023 In ISCA'23 ASSYST Workshop

TAP: Efficient Derivation of Tensor Parallel Plans for Large Neural Networks

We present a framework that drastically speeds up the process of deriving the tensor parallel schedule for large neural networks.

Ziji Shi, Fuzhao Xue, Jialin Li, Yang You

June 2023 In ISCA'23 MLArchSys Workshop

ParaGAN: A Cloud Training Framework for Generative Adversarial Networks

We present ParaGAN, a cloud-training framework for GAN, which demonstrates near optimal scaling performance on BigGAN.

Xianyan Jia, Le Jiang, Ang Wang, Wencong Xiao, Ziji Shi, Jie Zhang, Xinyuan Li, Langshi Chen, Yong Li, Zhen Zheng, Xiaoyong Liu, Wei Lin

July 2022 In USENIX ATC'22

Whale: Efficient Giant Model Training over Heterogeneous GPUs

Whale is a highly scalable and efficient distributed training framework for deep neural networks, introducing a hardware-aware parallel strategy and user-enabled model annotations for optimising large-scale model training, demonstrating its prowess by successfully training a multimodal model with over ten trillion parameters on a 512-GPU setup.

Fuzhao Xue, Ziji Shi, Futao Wei, Yuxuan Lou, Yong Liu, Yang You

December 2021 In AAAI'22

Going Wider Instead of Deeper

We propose an efficient parameter sharing strategy for Transformer architecture by replacing FFN with MoE layer and sharing the trainable parameters except the normalization layers. Competitive performance across CV and NLP tasks were achieved with up to 6x reduction in the numbers of unique parameters.