Gemma-4 31B + vLLM on RTX 6000 PRO : 1.17k tokens/sec and still asking for more
Throughput, latency, and queue depth for Gemma-4 31B served on vLLM under progressive load, from 12 to 24 concurrency The numbers that matter: 1.17k tok/s peak, ~0.7s median TTFT, and tail latency as the one thing to watch.
Jun 29, 20265 min read29

