Linux System Monitoring Tools
Disk I/O
Please see this blog.
- iostat : it collects disk statistics, waits for the given amount of time, collects them again and displays the difference
- iotop : it is a top-like utility for displaying real-time disk activity
- dstat : list all processes that are having effect on system-level changes (like doing I/O).
Network
iperf
iperf
use client-server architecture. To run it, you need to install it on both nodes, then run it in server mode on one end, client mode on the other one. You may see this post for more info.
Memory
Monitor available memory
Please refer to my previous post on free
.
CPU
top
: display and update sorted information about processeshtop
: interactive process viewer (aka prettier version oftop
)
Nvidia GPU
Use nvidia-smi
to monitor GPU usage. However, there are some caveats.
GPU Utilization
It is worth noting that GPU utilization as reported by nvidia-smi is not a reflection of how busy the GPU is. It is a measure of how much time any of the SMs
executing a kernel over the past second.
If the GPU is idle, this percentage will be near 0%. If it is busy, but not doing useful work, this percentage will be near 100%. For example, if you launch a kernel that does no work, it will report 100% utilization. If you launch a kernel that does some work, it will report a lower utilization. If you launch a kernel that does a lot of work, it will report 0% utilization. This is because the GPU is busy doing work, but it is not doing useful work. The GPU utilization metric is a measure of how much time the GPU is doing useful work.
GPU Memory Usage
Percent of time over the past sample period (usually 1 second) during which global (device) memory was being read or written.
You can keep the utilization counts near 100% by simply running a kernel on a single SM and transferring 1 byte over PCI-E back and forth. Utilization is not a “how well you’re using the resources” statistic but “if you’re using the resources”. To get the SM-level information, consider using nvidia profilers.