A Guy Built a Custom C Engine That Runs a 397 Billion Parameter AI Model on a Regular MacBook — Here Is How Flash-Moe Actually Works
Flash-Moe is a pure C/Metal inference engine that runs Qwen3.5-397B on a MacBook Pro with 48GB RAM at 4.4 tokens per second by streaming expert weights from SSD. No Python, no frameworks — just raw performance.