b70-optimization-lab

Unofficial Intel XPU Community Lab

Community setup guides, benchmark recipes, troubleshooting notes, and patches for Intel XPU local AI work.

Start Here

What This Is

This repository is meant to become a stable community hub for Intel XPU local AI:

Quick Paths

I want to… Go here
Ask for setup help Discussions
Read community-maintained notes Wiki
Deploy MiniMax M2.7 INT4 on 4x B70 MiniMax Ubuntu 24 guide
Find model-specific recipes Model recipes
Share a benchmark Community results guide
Compare GPUs GPU comparison
Send Intel feedback Feedback for Intel

Current Practical Baseline

The best documented fresh install today is:

This is a deployable baseline, not the final speed ceiling. The strict benchmark/quality lane remains p512/n1536 at context 2048 for comparability; the served OpenAI-compatible endpoint now defaults to 32768 and validated a 32,408-token prompt plus 64 generated tokens without OOM.

How To Contribute

Open a discussion with:

Good categories for discussion:

Deep Lab Notes Below

The rest of this README is dense historical lab context. New users should start with the links above.

Current B70 Findings

Layout

Notes

The strongest quality-preserving paths are now Q4_0 GGUF TP3 with root-residual disabled and static FP8 TP4 with verified n-gram speculative decoding. The INT4 AutoRound path remains interesting for maximum speed, but it should be treated separately because it changes quantization quality more aggressively.