
One config, three platforms, fully local. Cross-platform 1-bit LLM inference harness
- On-device assistants: Interactive AI on laptops with low latency
- Edge robotics: Compact deployment on devices with thermal/memory constraints
- Cost-sensitive GPU serving: Higher throughput and lower energy/token on RTX-class GPUs
- Enterprise private inference: Local inference for data residency requirements
- Mobile deployment: Runs on phones due to low memory footprint
- ESP32 edge nodes: Tiny model fallback with federation to desktop peers
Visit the repo here : https://github.com/r13xr13/bonsai-harness




