LLM-GPU Benchmarks
Benchmark matrix for models and GPUs.
Each public cell shows one locked headline metric only: max stable aggregate generation tok/s for the selected preset. Click any GPU, model, or cell to open the detailed view.
Warmed runs only
Load time excluded from public metric
Demo placeholder data
Preset
Max stable aggregate generation tok/s
Visible table cells show only the headline metric for Gameplay Dialogue (1024 prompt / 256 output).
| GPU / Model | ||
|---|---|---|
Benchmark Detail
Atlas G24 × Orion Chat 14B
Gameplay Dialogue · 1024 prompt / 256 output
Headline metric
114.8 tok/s
Max stable aggregate generation throughput after warmup.
Concurrency curve
Stable max marked by last blue point before instability
p50 / p95 TTFT
352 ms / 488 ms
p50 / p95 TPOT
17.6 ms / 24.3 ms
p50 / p95 ITL
41 ms / 62 ms
Error rate
0.18%
Quantization / Precision
AWQ 4-bit · FP16 KV
Runtime
0.8.4 · Ubuntu 22.04 · CUDA 12.4