b70-optimization-lab

Feedback For Intel

This is a discussion entry point. The detailed technical note is Notes To Intel: Making 4x B70 vLLM/MiniMax Deployment Easier.

Short Version

The B70 hardware is promising for local AI because 32 GB of VRAM below workstation-GPU pricing opens models that are awkward on 16 GB and 24 GB cards. The main problem is not the card. The main problem is that the software path is still too fragile for normal users.

The working MiniMax M2.7 INT4 deployment required a narrow combination of:

That is reasonable for a lab. It is not reasonable for a broad community install story.

What Would Help Most

  1. Publish a tested B70 vLLM compatibility matrix.
  2. Publish ABI-matched vllm-xpu-kernels wheels for supported PyTorch XPU releases.
  3. Provide a single “B70 local LLM doctor” command that checks drivers, Level Zero, PyTorch XPU, oneCCL, vLLM, native extension ABI, and cache permissions.
  4. Make oneAPI compiler version selection explicit and warn on mixed header/library stacks.
  5. Fix or make actionable the ocloc/IGC internal compiler error seen during Triton/Inductor compilation.
  6. Upstream or register Intel/XPU/MiniMax vLLM environment flags so logs do not call important flags “unknown.”
  7. Ship small deterministic quality canaries for XPU examples so optimization work does not silently corrupt output.

Community Angle

The B70 can become useful community hardware if users can share recipes, not just screenshots. A good Intel-supported path would make it easy to publish:

This repository is structured around that idea: docs/ for humans, repro/ for install recipes, notes/ and data/ for detailed lab evidence.

Messaging That Would Land Better

Community users are trying to run current models, not only old reference demos. Intel examples and launch content should include modern local-AI targets that enthusiasts actually care about, such as current Qwen, MiniMax, Kimi-class, GLM-class, and other frontier-adjacent open-weight models where licensing allows.

Useful public examples would include:

The community conversation is not only “what is the top tok/s?” It is also: