b70-optimization-lab

B70 Optimization Lab Docs

This docs folder is the human entry point for the B70 optimization work. The executable install recipes live under ../repro/; docs link to those folders instead of duplicating every script.

Start Here

Build Photos

The community build guide includes example B70 photos and explains what details future contributors should capture: card spacing, airflow, power, slot order, risers, cooling, and visible diagnostics.

Repository Layout

Current Deployable Baseline

The current clean “start from Ubuntu 24 and serve on the LAN” baseline is:

The served endpoint was also validated at prompt 32,408 / output 64 without OOM, and warm short decode stayed near 84.1 output tok/s afterward. The older strict speed record and newer constrained structured-output lane remain useful references, but they were measured on different conditions. The current host’s PCIe4 fabric measured about half the old large-message allreduce bandwidth, which is a plausible reason this fresh deployment lands at 83 output tok/s instead of the older 89-93 class. See ../notes/2026-05-23-current-host-pcie4-prefill-check.md for the math.

The 32k context promotion is documented in ../notes/2026-05-23-b70-display-disable-32768-context.md.