b70-optimization-lab

Model Recipes

This page is the community-facing index for reproducible model deployments. Each recipe should make clear what it proves: installation, quality, benchmark speed, serving, or all of the above.

How To Read A Recipe

Every complete recipe should include:

Do not compare two results unless their model, quantization, prompt length, output length, context length, batch/concurrency, and quality gate are clear.

Current Recipes

Recipe Status What It Is For
../repro/minimax-m27-b70-110tps-ubuntu24-20260523/ Deployable baseline Fresh Ubuntu 24.04 setup for 4x B70, MiniMax M2.7 INT4 AutoRound, vLLM OpenAI-compatible endpoint on 0.0.0.0:8000.
../repro/minimax-m27-b70-89tps-20260520/ Strict speed baseline Older strict quality-passed MiniMax M2.7 INT4 lane with higher output-token throughput. Useful for optimization comparisons.

MiniMax M2.7 INT4 AutoRound

Start with:

cd repro/minimax-m27-b70-110tps-ubuntu24-20260523
sudo bash scripts/00-install-system-deps.sh
sudo reboot

After reboot:

sudo bash scripts/01-prepare-storage.sh
bash scripts/02-download-model.sh
bash scripts/03-build-stack.sh
bash scripts/04-verify-runtime.sh
bash scripts/05-run-quality-and-benchmark.sh
bash scripts/06-serve-openai-compatible.sh

Then in another terminal:

bash scripts/07-smoke-test-endpoint.sh

See the full deployment guide for explanation and troubleshooting.

Future Recipe Slots

These are useful community targets to add as separate repro folders:

Suggested Recipe Folder Template

Use a name that includes model, hardware, headline result, OS/date, and avoid spaces:

repro/<model>-<hardware>-<headline>-<os>-<yyyymmdd>/
  README.md
  configs/
  scripts/
  patches/
  results/
  notes/

For example:

repro/minimax-m27-b70-110tps-ubuntu24-20260523/

Publishing Results

For a result to be useful to other people, include: