Transformer

TAP: Efficient Derivation of Tensor Parallel Plans for Large Neural Networks

We present a framework that drastically speeds up the process of deriving the tensor parallel schedule for large neural networks.