400B Parameter Inference on a Single Board Computer
A Radxa Orion O6 with 64Gb of unified memory retails for less than $900USD today. Add a decent NVME drive and you’re still only looking at ~$1000USD. Qwen 3.5, a 397B parameter model runs on this little inference rig. The Qwen 3.5 model itself is 240Gb (using unsloth’s Q4_K_M quantization)...
[Read More]