b70-optimization-lab

Unofficial Intel XPU Community Lab

Community setup guides, benchmark recipes, troubleshooting notes, and patches for Intel XPU local AI work.

Start Here

What This Is

This repository is meant to become a stable community hub for Intel XPU local AI:

Quick Paths

I want to… Go here
Ask for setup help Discussions
Read community-maintained notes Wiki
Reproduce the current work Current reproducibility map
Deploy MiniMax M2.7 INT4 on 4x B70 MiniMax Ubuntu 24 guide
Run the endpoint as a service Production c1 service
Find model-specific recipes Model recipes
Share a benchmark Community results guide
Compare GPUs GPU comparison
Send Intel feedback Feedback for Intel

Current Practical Baseline

The best documented fresh install today is:

This is a deployable baseline, not the final speed ceiling. The strict benchmark/quality lane remains p512/n1536 at context 2048 for comparability; the served OpenAI-compatible endpoint now defaults to 32768 and validated a 32,408-token prompt plus 64 generated tokens without OOM.

Experimental RAM-backed session-cache and TurboQuant work is tracked separately under experiments/minimax_xpu_kv_offload. The current known-good session-cache profile is c2 for two parked 32768-token window sessions. A smaller 22.5K live smoke is documented as an operations canary, not as the desired context limit. c4/c8 and TurboQuant remain research modes, not production defaults.

How To Contribute

Open a discussion with:

Good categories for discussion:

Deep Lab Notes Below

The rest of this README is dense historical lab context. New users should start with the links above.

Current B70 Findings

Layout

Notes

The strongest quality-preserving paths are now Q4_0 GGUF TP3 with root-residual disabled and static FP8 TP4 with verified n-gram speculative decoding. The INT4 AutoRound path remains interesting for maximum speed, but it should be treated separately because it changes quantization quality more aggressively.