NFS and Lustre
I’ve seen both NFS and Lustre in many clusters. To understand their difference, I decide to dig into it. Both of them can:
- share a local file system name space across the network;
- be robust and resilient to network failures.
NFS
- stateless by design: to make it more robust to network failure (state was introduced in NFS4)
- ubiquitous: built for everything
- easy to set up and troubleshoot
- I/O performance does not scale well: it uses an in-band protocal. Control messages are sent together with payload, which limits the size of payload. Read operations scale better than write operations.
- supports POSIX
- licensed by NetApp
Lustre
- stateful by design: maintain connection so as to
- I/O performance scales : it uses a 3rd-party transfer. Requests are made to the metadata server and IO moves directly between the affected storage component(s) and the client. When adding more storage/servers, the contension for resource will be reduced, therefore, it is more scalable.
- supports POSIX
- open source version
Choice between NFS and Lustre
- NFS: if file operations are mostly read and it does not require sustained read or write bandwidth
- Lustre: if I/O bandwidth is a concern