[ Removed by Reddit ]
[ Removed by Reddit on account of violating the content policy. ]
[ Removed by Reddit on account of violating the content policy. ]
Hi there! I just open-sourced a high-performance inference engine focused on local and real-time workloads. Qwen3.6 27B (NVFP4) on FlashRT:
Would love for people to try it out and share feedback! https://github.com/LiangSu8899/FlashRT
🚀 I’ve successfully implemented the RL pipeline introduced in the π0.6 RECAP paper, and fully brought VLA RL onto the π0.5 stack.
Our current pipeline now supports:
• End-to-end VLA RL training & inference
• RECAP-style advantage-conditioned policy training
• QLoRA fine-tuning optimization
• Unified PyTorch + JAX execution paths
On the systems side, I also optimized the full RL runtime stack:
⚡ Up to 5× faster RL inference
⚡ Up to 2.2× faster QLoRA fine-tuning
⚡ Full pipeline running in only ~10GB VRAM
This includes:
• value function training
• ACP annotation
• RL policy fine-tuning
• CFG-guided inference
Made real VLA RL experimentation practical on consumer GPUs instead of requiring multi-H100 setups.
Would love for more people in the VLA / robotics community to try it out and give feedback.
Hi there! I just open-sourced a high-performance inference engine focused on local and real-time workloads. Qwen3.6 27B (NVFP4) on FlashRT:
Would love for people to try it out and share feedback! https://github.com/LiangSu8899/FlashRT
Hi everyone,
I’m an independent developer with a background in algorithms, HPC, and robotics infrastructure. Recently I’ve been working on a lightweight inference engine built around hand-written CUDA kernels, focusing on small-batch and real-time performance (especially for VLA and robotics workloads).
Here are some recent results on Thor and Blackwell:
The focus is on pushing true real-time inference under small-batch settings, which tends to be underserved by typical large-batch optimized stacks.
Still early, but happy to share more details or discuss if anyone is working on similar workloads 🙂
Feeback welcome!:https://github.com/LiangSu8899/FlashRT