A (missing) User Guide for Lingvo -- Google's Model Parallel Framework
This blog is a WIP.
It has been a year since last time I wrote a blog. I did not waste it :) I wrote an tutorial on setting up your own VPN server hosted by cloud provider, but as the GFW is getting harder to bypass and I am afraid my blog will be banned in China, I did not publish it.
Back to the main topic. As dataset and model size get larger in the DL community, there is a greater need to use more advanced parallelization techniques. Google’s Lingvo framework is a famous one for
Lately, I have been using lingvo to reproduce BERT. It is quite a learning journey, and I would like to share my thoughts and takeaways
Motivation for Model Parallelism (and using Lingvo)
- Larger model usually perform better. This is empirically shown to be true for machine translation tasks.
- Larger model requires more memory, and memory (instead of compute power) is the limiting factor, as memory does not follow Moore’s Law
A Brief Review of Model Parallelism Techniques
A common belief is that one should attemp Data Parallel->Pipeline Parallel -> Tensor Model Parallel.
Tour of Lingvo Framework
Typical lingvo tasks consist of the following:
builder, which is in charge of connecting sub-layers to form the computational graph
model, which is the definition of layers
input generator, in charge of pre-processing and batching inputs
builder class is never shown in the documentation.
Thoughts on the Future Model Parallelism Work