u/mishaurus

This is Bimo walking completely standalone: no data cable, no external compute, just a battery and an RP2040 (custom board) running the walking policy natively at ~5.2ms inference time.

The main walking model trains on thousands of parallel environments in Isaac Lab. That policy gets distilled down to a tiny student network and compiled directly into the MCU firmware.

Here's the pipeline:

Train a standard 256×128×64 teacher model in Isaac Lab (~5min on an RTX 4080)
Distill it into a 64×32 student network (~30s, yep, I was surprised too)
Export as pure C using onnx2c
Compile into the RP2040 firmware via Arduino IDE
Inference runs at 5.0-5.2ms, comfortably within the 50ms control loop

The full distillation pipeline, the standalone MCU inference code, and the Bimo API ported to ROS2 nodes are all coming in the next update (v1.1). ROS2 was a direct request from the last Reddit post, so that's in.

Has anyone else run RL locomotion policies natively on an MCU? How small have you made the student network before significantly degrading performance?

If you want to follow the development, join the Discord server, all updates go there first. Code update to v1.1 will be available on GitHub soon.

Bimo’s walking model now runs natively on a Raspberry Pi Pico at 5ms inference time!