u/1filipis — reddlx

I always wanted to try to fine-tune models to my own preferences to make them a bit more personalized. LoRA can train a certain character or style - this thing lets you steer model outputs directly without any references at all or even fine-tune an existing LoRA. This is in a way what Midjourney does when it gives you two pictures to vote and then builds your own slightly custom version of their model.

The PR is open here:

https://github.com/ostris/ai-toolkit/pull/808

Default parameters seem quite well tuned for quick results within a few iterations. The only difference in this implementation vs original: rewards are binary instead of relying on a ranking model

There's a new job type dropdown for creating Flow-GRPO tasks, and GRPO job has a voting interface that lets you generate samples and vote on them

Stuff yet to do:

Manual checkpoints
Reduce memory usage (Z-Image takes 40+ GB) and improve speed
UI polishing and bug fixing
Keep testing the algorithm on all models

Thus, I call it a POC. Will be pushing updates to my own branch as we go, but I doubt it will ever be merged into AI-Toolkit itself, so clone and have fun!