This is a discussion entry point. The detailed technical note is Notes To Intel: Making 4x B70 vLLM/MiniMax Deployment Easier.
The B70 hardware is promising for local AI because 32 GB of VRAM below workstation-GPU pricing opens models that are awkward on 16 GB and 24 GB cards. The main problem is not the card. The main problem is that the software path is still too fragile for normal users.
The working MiniMax M2.7 INT4 deployment required a narrow combination of:
2.11.0+xpu.vllm-xpu-kernels.That is reasonable for a lab. It is not reasonable for a broad community install story.
vllm-xpu-kernels wheels for supported PyTorch XPU releases.ocloc/IGC internal compiler error seen during Triton/Inductor compilation.The B70 can become useful community hardware if users can share recipes, not just screenshots. A good Intel-supported path would make it easy to publish:
This repository is structured around that idea: docs/ for humans, repro/ for install recipes, notes/ and data/ for detailed lab evidence.
Community users are trying to run current models, not only old reference demos. Intel examples and launch content should include modern local-AI targets that enthusiasts actually care about, such as current Qwen, MiniMax, Kimi-class, GLM-class, and other frontier-adjacent open-weight models where licensing allows.
Useful public examples would include:
The community conversation is not only “what is the top tok/s?” It is also: