LLM-GPU Benchmarks

Benchmark matrix for models and GPUs.

Each public cell shows one locked headline metric only: max stable aggregate generation tok/s for the selected preset. Click any GPU, model, or cell to open the detailed view.

Warmed runs only

Load time excluded from public metric

Demo placeholder data

Preset

Max stable aggregate generation tok/s

Visible table cells show only the headline metric for Gameplay Dialogue (1024 prompt / 256 output).

GPU / Model

Benchmark Detail

Atlas G24 × Orion Chat 14B

Gameplay Dialogue · 1024 prompt / 256 output

Headline metric

114.8 tok/s

Max stable aggregate generation throughput after warmup.

Concurrency curve

Stable max marked by last blue point before instability

p50 / p95 TTFT

352 ms / 488 ms

p50 / p95 TPOT

17.6 ms / 24.3 ms

p50 / p95 ITL

41 ms / 62 ms

Error rate

0.18%

Quantization / Precision

AWQ 4-bit · FP16 KV

Runtime

0.8.4 · Ubuntu 22.04 · CUDA 12.4