NFS and Lustre

I’ve seen both NFS and Lustre in many clusters. To understand their difference, I decide to dig into it. Both of them can:

  • share a local file system name space across the network;
  • be robust and resilient to network failures.

NFS

  • stateless by design: to make it more robust to network failure (state was introduced in NFS4)
  • ubiquitous: built for everything
  • easy to set up and troubleshoot
  • I/O performance does not scale well: it uses an in-band protocal. Control messages are sent together with payload, which limits the size of payload. Read operations scale better than write operations.
  • supports POSIX
  • licensed by NetApp

Lustre

  • stateful by design: maintain connection so as to
  • I/O performance scales : it uses a 3rd-party transfer. Requests are made to the metadata server and IO moves directly between the affected storage component(s) and the client. When adding more storage/servers, the contension for resource will be reduced, therefore, it is more scalable.
  • supports POSIX
  • open source version

Choice between NFS and Lustre

  • NFS: if file operations are mostly read and it does not require sustained read or write bandwidth
  • Lustre: if I/O bandwidth is a concern
Ziji SHI(史子骥)
Ziji SHI(史子骥)
Ph.D. candidate

My research interests include distributed machine learning and high-performance computing.