Blog

GPU Monitoring Insights

Technical articles on GPU monitoring, memory leak detection, and MLOps best practices.

Release gpulse

Introducing gpulse v2.2: Onboarding, Config System, and Security

What's new in gpulse v2.2 — onboarding wizard, gpulse config command, secure file permissions, and atomic writes.

3 min read
Comparison Tools

GPU Monitoring Tools Compared: gpulse vs nvitop vs btop vs Datadog

An honest comparison of GPU monitoring tools — features, vendor support, leak detection, and pricing side by side.

4 min read
Technical Leak Detection

How We Detect GPU Memory Leaks Before They Crash Your Run

A technical deep dive into gpulse's three leak detection algorithms: linear regression, spike detection, and composite scoring.

4 min read
MLOps Fleet

Monitoring GPUs Across Your Training Cluster

How to get a unified view of GPU health across multiple machines using SSH-based fleet monitoring.

3 min read
NVIDIA DevOps

nvidia-smi Isn't Enough: Why You Need a GPU Dashboard

nvidia-smi gives you a snapshot. Here's why real-time GPU dashboards with history and alerts are the better approach.

3 min read
Apple Silicon macOS

Apple Silicon GPU Monitoring: What macOS Doesn't Tell You

Activity Monitor barely scratches the surface. Here's how to get real GPU metrics from your M1/M2/M3/M4 Mac.

3 min read
ML Memory Leaks

The Hidden Cost of GPU Memory Leaks

Slow VRAM creep kills overnight training runs. Learn how leak detection algorithms catch problems before OOM.

3 min read
ML Monitoring

Why GPU Monitoring Matters for ML Training

Lost training runs and wasted compute dollars are preventable. Real-time GPU monitoring is the first line of defense.

3 min read