A (missing) User Guide for Lingvo -- Google's Model Parallel Framework

This blog is a WIP.

It has been a year since last time I wrote a blog. I did not waste it :) I wrote an tutorial on setting up your own VPN server hosted by cloud provider, but as the GFW is getting harder to bypass and I am afraid my blog will be banned in China, I did not publish it.

Back to the main topic. As dataset and model size get larger in the DL community, there is a greater need to use more advanced parallelization techniques. Google’s Lingvo framework is a famous one for

Lately, I have been using lingvo to reproduce BERT. It is quite a learning journey, and I would like to share my thoughts and takeaways

Motivation for Model Parallelism (and using Lingvo)

  • Larger model usually perform better. This is empirically shown to be true for machine translation tasks.
  • Larger model requires more memory, and memory (instead of compute power) is the limiting factor, as memory does not follow Moore’s Law

A Brief Review of Model Parallelism Techniques

A common belief is that one should attemp Data Parallel->Pipeline Parallel -> Tensor Model Parallel.

Tour of Lingvo Framework

Typical lingvo tasks consist of the following:

  • builder, which is in charge of connecting sub-layers to form the computational graph
  • model, which is the definition of layers
  • input generator, in charge of pre-processing and batching inputs

Among them, builder class is never shown in the documentation.

Thoughts on the Future Model Parallelism Work

  • Sparsity-powered
Ziji Shi(史子骥)
Ziji Shi(史子骥)
Doctoral Student

My research interests include distributed machine learning and high-performance computing.