u/ren_cross

Character (synthetic IG persona, fully-locked identity):

~20yo athletic white European woman, platinum-blonde hair with mint-green tips

2 facial piercings (vertical L-brow barbell + horizontal bridge barbell)

Blackwork tattoos: tree-branch on neck/chest + cracked-pattern full sleeves both arms

5 silver rings (consistent count), matte-black nails

Edgy / punk / skate vibe

Setup that i'm using at the moment:

Qwen-Image (20B) via ai-toolkit (Ostris), uint3 quantized + accuracy-recovery adapter, on a 24GB 3090

87 training images, all generated via ChatGPT Images 2 for cross-image consistency (no real photos exist):

74 bare-arm (tattoos + rings visible)

13 covered-outfit (jackets / sleeves / gloves) with num_repeats: 2 → ~26% effective, to teach conditional coverage so prompting "wearing a leather jacket" actually hides the tattoos

Captions: JoyCaption Beta One → manual cleaning → 2 multi-agent verification rounds (38 corrections total)

Caption strategy: omit invariant identity features (hair color, piercings, eye color) so they bind to the trigger word; caption everything that varies (pose, framing, hair state, coverage status, rings-visible vs no-rings, gloves vs no-gloves)

Hyperparams: rank 32 / alpha 16, LR 1e-4, 3000 steps, adamw8bit, flowmatch, multi-res [512, 768, 1024], grad checkpointing, no TE training, caption dropout 0.05

Mid-training (step 1750 / 3000) results:

✅ Tattoos lock fast and consistently across all prompts

✅ Trigger binding clean: prompts without the trigger generate a random woman, not her

⚠️ Face identity inconsistent — best when the prompt has contextual anchors (jacket + backwards cap); drifts on plain "tank top + grey studio"

❌ Piercings often missing or distorted (the main worry)

⚠️ Mild hair-color leak to non-trigger prompts (cosmetic only — face does NOT leak)

Questions:

Is "leave invariant fine details uncaptioned" actually the wrong call for piercings? Should I caption them explicitly even if it costs the auto-trigger-binding?

Is uint3 quantization the bottleneck on fine details like piercings? Worth retraining at fp8 with CPU offload despite the speed hit?

Is 87 images the floor for a character this feature-loaded — do you really need 150+?

Higher rank (64+) for fine-detail capture, or does that just overfit at this dataset size?

Hard-coupled features (tattoos + rings + piercings always present together) — is one LoRA correct, or would stacked / decomposed LoRAs work better here?

Better captioner than JoyCaption Beta One for this kind of fine detail?

Anything obvious I'm doing wrong?

Thanks in advance guys :)

(all images that im uploading are consistent and come from gpt images 2)

https://preview.redd.it/697181dvg82h1.png?width=1122&format=png&auto=webp&s=d73c3932b0eebf5f23d0bf8dfcc680479d68de45

https://preview.redd.it/bsdie1dvg82h1.png?width=1122&format=png&auto=webp&s=d626161840fd609b21230d1ada8f08d805c282e6

https://preview.redd.it/lfpxv1dvg82h1.png?width=1122&format=png&auto=webp&s=e3f0419c44b62dd75bc4007dda29b80ad6b5191d

https://preview.redd.it/jc4n22dvg82h1.png?width=1122&format=png&auto=webp&s=6853cceab81ac87e1c551d9206fd6deca09a3867

https://preview.redd.it/udseq1dvg82h1.png?width=1122&format=png&auto=webp&s=fdc5b1d1275b4d96067319d2f2e307efd7d13ad9

https://preview.redd.it/mlfsy1dvg82h1.png?width=1122&format=png&auto=webp&s=a08fea65f336f942390a5b2246828b6f4a6193dc

https://preview.redd.it/moe5q2dvg82h1.png?width=1122&format=png&auto=webp&s=878b364380ce19f68978c2c33055ec9863d87aa1

How do i create a 85% to 95% LoRA of a complex character?