
How do i create a 85% to 95% LoRA of a complex character?
Character (synthetic IG persona, fully-locked identity):
~20yo athletic white European woman, platinum-blonde hair with mint-green tips
2 facial piercings (vertical L-brow barbell + horizontal bridge barbell)
Blackwork tattoos: tree-branch on neck/chest + cracked-pattern full sleeves both arms
5 silver rings (consistent count), matte-black nails
Edgy / punk / skate vibe
Setup that i'm using at the moment:
Qwen-Image (20B) via ai-toolkit (Ostris), uint3 quantized + accuracy-recovery adapter, on a 24GB 3090
87 training images, all generated via ChatGPT Images 2 for cross-image consistency (no real photos exist):
74 bare-arm (tattoos + rings visible)
13 covered-outfit (jackets / sleeves / gloves) with num_repeats: 2 → ~26% effective, to teach conditional coverage so prompting "wearing a leather jacket" actually hides the tattoos
Captions: JoyCaption Beta One → manual cleaning → 2 multi-agent verification rounds (38 corrections total)
Caption strategy: omit invariant identity features (hair color, piercings, eye color) so they bind to the trigger word; caption everything that varies (pose, framing, hair state, coverage status, rings-visible vs no-rings, gloves vs no-gloves)
Hyperparams: rank 32 / alpha 16, LR 1e-4, 3000 steps, adamw8bit, flowmatch, multi-res [512, 768, 1024], grad checkpointing, no TE training, caption dropout 0.05
Mid-training (step 1750 / 3000) results:
✅ Tattoos lock fast and consistently across all prompts
✅ Trigger binding clean: prompts without the trigger generate a random woman, not her
⚠️ Face identity inconsistent — best when the prompt has contextual anchors (jacket + backwards cap); drifts on plain "tank top + grey studio"
❌ Piercings often missing or distorted (the main worry)
⚠️ Mild hair-color leak to non-trigger prompts (cosmetic only — face does NOT leak)
Questions:
Is "leave invariant fine details uncaptioned" actually the wrong call for piercings? Should I caption them explicitly even if it costs the auto-trigger-binding?
Is uint3 quantization the bottleneck on fine details like piercings? Worth retraining at fp8 with CPU offload despite the speed hit?
Is 87 images the floor for a character this feature-loaded — do you really need 150+?
Higher rank (64+) for fine-detail capture, or does that just overfit at this dataset size?
Hard-coupled features (tattoos + rings + piercings always present together) — is one LoRA correct, or would stacked / decomposed LoRAs work better here?
Better captioner than JoyCaption Beta One for this kind of fine detail?
Anything obvious I'm doing wrong?
Thanks in advance guys :)
(all images that im uploading are consistent and come from gpt images 2)