Multi-GPU Monitoring
gpulse is built for systems with multiple GPUs. This guide covers the views and workflows for monitoring two or more devices simultaneously.
Grid View Overview
Grid view is the default and the best starting point for any multi-GPU system. Press g to switch to it from any other view.
Each GPU tile shows:
- GPU index and model name
- Memory bar: used / total with percentage
- Utilization bar: current SM occupancy
- Temperature and power draw
- A colour-coded health indicator
Colour-Coded Health
| Colour | Meaning |
|---|---|
| Green | All metrics within normal ranges |
| Yellow | At least one metric in the warning range (e.g., temperature 70-85 C, memory > 80%) |
| Red | At least one metric critical (e.g., temperature > 85 C, memory > 95%, ECC uncorrected error) |
Scanning the grid top-to-bottom lets you spot an outlier GPU at a glance without reading every number.
Sorting
In Grid and List views, press o to cycle through sort orders:
| Sort Order | Description |
|---|---|
| Index | GPU 0, 1, 2... (default) |
| Memory Used | Highest memory consumer first |
| Utilization | Highest compute load first |
| Temperature | Hottest GPU first |
| Name | Alphabetical by model name |
Detail Deep-Dive
To investigate a single GPU:
- In Grid or List view, use Up / Down to highlight the GPU
- Press Enter to select it, then d for Detail view
Detail view divides the screen into four quadrants:
Top-left
Memory utilization timeline (last N seconds of history)
Top-right
GPU compute utilization timeline
Bottom-left
Temperature and power readings with sparklines
Bottom-right
Live process table — PID, name, memory, and user
Press g or v to return to the multi-GPU overview.
Compare View
Compare view places two or more GPUs side-by-side with matching metric rows so you can spot imbalances in a distributed training job. Press c to open it.
Typical use cases:
- Verifying all GPUs in a data-parallel training run consume similar memory and compute
- Identifying a "slow GPU" causing others to block at synchronisation barriers
- Checking tensor-parallel model layer splits across devices
Use Left / Right to change the comparison target.
Topology View
Press t to open Topology view. It renders a diagram of the physical interconnect between GPUs, including:
- PCIe links: bandwidth class (x8, x16) and CPU socket attachment
- NVLink bridges: direct GPU-to-GPU links and negotiated bandwidth
Two GPUs connected via NVLink can exchange tensors at 600 GB/s (NVLink 4.0), while GPUs on opposite NUMA nodes over PCIe may see 10-20x lower effective bandwidth. If distributed training is unexpectedly slow, check Topology view for the interconnect path.
16+ GPU Systems (Pagination)
On systems with more than 8 GPUs, Grid view paginates automatically.
| Key | Action |
|---|---|
| PgDn | Next page of GPUs |
| PgUp | Previous page of GPUs |
The status bar shows the current page (e.g., GPUs 9-16 of 64). All metrics continue updating for off-screen GPUs. For 16+ GPU systems, consider List view (v) as a denser alternative.