
Character lora tool : GridLoraTester
I've been working on this for a few months and it's finally in a state where I think it might be useful to someone other than me. Sharing it here in case you're trying to train character LoRAs on FLUX-2 and you're tired of guessing.
The premise: every time I train a character LoRA, I end up stuck on two questions.
- Is my dataset actually balanced and identity-consistent, or am I just hoping?
- Once trained, which step actually holds likeness across the whole prompt sweep — not just the one flattering close-up?
GridLoraTester answers both with numbers from face-recognition scores. It's split in two surfaces; you can use either independently.
Dataset curation
- Face recognition (ArcFace via InsightFace
buffalo_l) gives every photo a similarity score against a per-dataset centroid (mean of all detected faces). Off-identity photos surface immediately. - Pose × framing classifier (front / ¾ / profile × close-up / medium / wide / extreme). A dataset-health checklist tells you what's balanced and what's under-represented vs published portrait-dataset targets.
- Prune candidates when you're over a max size — most-redundant photos within over-represented buckets, ranked by k=3 nearest in-bucket cosine. Soft delete, fully reversible.
- External-photo suggestions — link Immich / Google Photos / a local folder, and the engine mines that library for photos that fit the dataset's identity AND fill an under-rep bucket. Pose-tempered scoring so profile shots aren't penalised. Dedup runs both vs the existing dataset AND across the suggestions themselves, so the same photo on Immich + Google Photos collapses to one suggestion.
- BlockHash 256-bit near-duplicate detection (10-bit Hamming threshold) underneath all of the above.
Grid testing
- One row per checkpoint × one column per prompt, same seed across the grid for fair comparison.
- Every cell scored against the dataset centroid: green ≥ 0.50 / amber ≥ 0.35 / red < 0.35.
- Per-prompt aspect ratio via
[3:4]/[16:9]prefixes; resolution comes from a single MP budget.[trigger]placeholder substituted automatically. - Run history per test — flip between runs to compare quant changes, training continuation, or rescore a past run against an updated centroid without regenerating anything.
- Score-vs-step graph (median / p20 / max). Useful for picking the checkpoint where p20 (consistency) catches up with median (peak) instead of just chasing the spikes.
Tech bits, in case you care
- FLUX-2 Klein via diffusers; FP8 / FP8 dynamic / bf16 / INT8 ConvRot quant paths. INT8 ConvRot uses Hadamard rotation +
torch._int_mmcuBLASLt → ~2× faster denoise than FP8 weight-only on Ampere (3090/3080), same VRAM (~9 GB transformer for Klein 9B). LoRA bake-in viaTensor.data.copy_()preserves Parameter identity sotorch.compilesurvives swaps. - Prompt-embedding cache in SQLite. After encoding, Qwen3 text encoder is fully unloaded (del + gc +
empty_cache()) so it doesn't squat VRAM during the denoise + VAE. - Per-shape batching in the grid loop — mixed AR rows don't crash batched inference; prompts grouped by
(w, h)before eachpipe()call. - Dashboard is SvelteKit + better-sqlite3 in WAL mode. Python writes back to the same DB the dashboard reads — no IPC marshalling, just shared SQLite.
- Idle-TTL on the face worker frees the ORT BFC arena (~5–6 GB) when not in use; lazy-respawn on next request.
What it isn't
- Not a trainer. It eats the LoRA folder your trainer (ai-toolkit, etc.) already produces.
- FLUX-2 only right now. The pipeline-load code is reasonably isolated; FLUX-1 / SD3 / Wan2.2 aren't out of the question if there's demand.
- NVIDIA + ≥ 24 GB VRAM. Linux is the tested path; the dashboard runs on macOS/Windows but the inference side wants Linux + CUDA.
License
Source-available under PolyForm Noncommercial 1.0.0 — free for personal / hobby / research / education. Commercial use is a separate paid license (details in LICENSE). MIT was too permissive for the niche; PolyForm cleanly splits "free for everyone learning" from "paid if you're shipping a product on top".
Repo
→ https://github.com/Mandrakia/GridLoraTester
Bug reports and PRs welcome. Particularly interested in feedback on the suggestion engine's bucket-targeting heuristic and the grid-test sort UX — those are the two surfaces where my own preferences leak into the defaults most.
Screenshots
Dataset list Dataset details Dataset stats Dataset edit : Prune Dataset edit : Suggestions Test setup Test grid result Test graphi result