This docs folder is the human entry point for the B70 optimization work. The executable install recipes live under ../repro/; docs link to those folders instead of duplicating every script.
The community build guide includes example B70 photos and explains what details future contributors should capture: card spacing, airflow, power, slot order, risers, cooling, and visible diagnostics.
docs/: narrative guides, FAQ, community-facing summaries, comparison notes.repro/: runnable install/build/benchmark/serve recipes and pinned artifacts.notes/: lab notebook entries, including negative results.data/: structured benchmark records, payloads, and LocalMaxxing responses.patches/: patch records and source-level optimization deltas.scripts/: shared harnesses used by repro folders and lab runs.The current clean “start from Ubuntu 24 and serve on the LAN” baseline is:
../repro/minimax-m27-b70-110tps-ubuntu24-20260523/Lasimeri/MiniMax-M2.7-int4-AutoRound0.0.0.0:800032768 tokens by default110.90 total tok/s, 83.17 output tok/s for p512/n153683.8 output tok/s and 1.7k-1.8k
prompt/prefill tok/sb02ad184553a5ef4e3946a94b8e6124980bc369fThe served endpoint was also validated at prompt 32,408 / output 64 without OOM,
and warm short decode stayed near 84.1 output tok/s afterward. The older strict
speed record and newer constrained structured-output lane remain useful
references, but they were measured on different conditions. The current host’s
PCIe4 fabric measured about half the old large-message allreduce bandwidth,
which is a plausible reason this fresh deployment lands at 83 output tok/s
instead of the older 89-93 class. See
../notes/2026-05-23-current-host-pcie4-prefill-check.md for the math.
The 32k context promotion is documented in
../notes/2026-05-23-b70-display-disable-32768-context.md.