u/Compunerd3

I built an open source hyperparameter search tool for diffusion fine-tunes- pick the winner based on scoring

I built an open source hyperparameter search tool for diffusion fine-tunes- pick the winner based on scoring

I kept running the same loop: train a LoRA, look at the samples, decide it’s “fine”, change three things at once, train again, then when a new dataset needs training, all the parameters previously need to be reviewed again. So I built something to take the hassle out of this.

It’s called Bracket.

  • You point it at a dataset and a model
  • Set a budget (such as sample size to test # of candidates or variations to try out
  • It runs X short training trials in parallel configurations (Optuna TPE for the search).
  • Each run gets scored two ways:
    • The training-loss trajectory,
    • A local VLM (LM Studio) judging the sample images on prompt-adherence, visual quality, and artifact-freeness.
  • At the end you get a Markdown report with Welch’s t-test confidence on which config wins. The whole point is to replace “this LoRA looks better to me” with “config X beats baseline by 0.34 with p=0.03 over 4 seeds”.

It doesn’t reimplement training. It drives musubi-tuner and sd-scripts as subprocesses, so the trainers are exactly what kohya already supports — same args, same outputs. Currently covers SDXL, Z-Image, Flux.1, Flux.1-Kontext, Flux-2-Klein, Qwen-Image (+ Edit), SD3.5, HunyuanVideo, Wan 2.1/2.2, LTX-Video, FramePack. LoRA and full FT for most.

A few engineering bits that might be interesting:

  • Trainers always launch through accelerate because raw python triggers a 2000-second-per-iteration Accelerator init on Blackwell GPUs. Tqdm is force-disabled because \r writes fill the OS pipe buffer when stdout is captured and freeze the trainer.
  • VRAM-tier-aware search space — detects the GPU and only proposes configs the card can actually run. No wasted OOM trials.
  • Curated warm-start: each trainer adapter ships 3-5 known-good configs that run before TPE takes over, so you get useful comparisons in the first 30 minutes instead of the third hour.
  • VLM judge uses OpenAI-spec response_format: json_schema so the output is grammar-constrained at the llama.cpp level — zero JSON parse failures, no rambling. There’s a toggle that sends chat_template_kwargs={enable_thinking: false} to skip the <think> preamble on Qwen3-class VLMs.
  • Self-updater built into the React UI — toast when there’s a new commit, click Update, it pulls + rebuilds + relaunches.

MIT, runs locally, no telemetry, no account.

Repo: https://github.com/tlennon-ie/bracket

Honest about what it isn’t: it’s not a magic better-LoRA or finetune generator, it’s a search harness. If the dataset is bad it’ll just tell you “all 8 configs are bad” with high confidence. The value is turning “I think this LoRA is better” into a number you can defend.

https://preview.redd.it/1dg557xytd0h1.png?width=1596&format=png&auto=webp&s=a405ab37837b3e35ce1674b79c6f422838e8b1dd

reddit.com
u/Compunerd3 — 3 days ago

I built an open source hyperparameter search tool for diffusion fine-tunes- pick the winner based on scoring

I kept running the same loop: train a LoRA, look at the samples, decide it’s “fine”, change three things at once, train again, then when a new dataset needs training, all the parameters previously need to be reviewed again. So I built something to take the hassle out of this.

It’s called Bracket.

  • You point it at a dataset and a model
  • Set a budget (such as sample size to test # of candidates or variations to try out
  • It runs X short training trials in parallel configurations (Optuna TPE for the search).
  • Each run gets scored two ways:
    • The training-loss trajectory,
    • A local VLM (LM Studio) judging the sample images on prompt-adherence, visual quality, and artifact-freeness.
  • At the end you get a Markdown report with Welch’s t-test confidence on which config wins. The whole point is to replace “this LoRA looks better to me” with “config X beats baseline by 0.34 with p=0.03 over 4 seeds”.

It doesn’t reimplement training. It drives musubi-tuner and sd-scripts as subprocesses, so the trainers are exactly what kohya already supports — same args, same outputs. Currently covers SDXL, Z-Image, Flux.1, Flux.1-Kontext, Flux-2-Klein, Qwen-Image (+ Edit), SD3.5, HunyuanVideo, Wan 2.1/2.2, LTX-Video, FramePack. LoRA and full FT for most.

A few engineering bits that might be interesting:

  • Trainers always launch through accelerate because raw python triggers a 2000-second-per-iteration Accelerator init on Blackwell GPUs. Tqdm is force-disabled because \r writes fill the OS pipe buffer when stdout is captured and freeze the trainer.
  • VRAM-tier-aware search space — detects the GPU and only proposes configs the card can actually run. No wasted OOM trials.
  • Curated warm-start: each trainer adapter ships 3-5 known-good configs that run before TPE takes over, so you get useful comparisons in the first 30 minutes instead of the third hour.
  • VLM judge uses OpenAI-spec response_format: json_schema so the output is grammar-constrained at the llama.cpp level — zero JSON parse failures, no rambling. There’s a toggle that sends chat_template_kwargs={enable_thinking: false} to skip the <think> preamble on Qwen3-class VLMs.
  • Self-updater built into the React UI — toast when there’s a new commit, click Update, it pulls + rebuilds + relaunches.

MIT, runs locally, no telemetry, no account.

Repo: https://github.com/tlennon-ie/bracket

Honest about what it isn’t: it’s not a magic better-LoRA or finetune generator, it’s a search harness. If the dataset is bad it’ll just tell you “all 8 configs are bad” with high confidence. The value is turning “I think this LoRA is better” into a number you can defend.

https://preview.redd.it/27w2a7lrtd0h1.png?width=1597&format=png&auto=webp&s=50b47b449bda98ea1b98744e13a83dbdef4cc7c3

reddit.com
u/Compunerd3 — 3 days ago

Sharing "cull" : my open-source dataset tool for image scraping & classification & captioning pipeline

open-sourced a tool I built and am maintaining called Cull.
It’s a machine curation engine for AI image datasets, the kind of work that eats hours every time you want to train a LoRA, build a reference library, or just classify an archive that isn’t a 100,000-file mess.

What it does, end to end

  • Scrapes from Civitai (.com and .red), X/Twitter, Reddit, Discord, plus any URL gallery-dl supports (Pixiv, DeviantArt, the booru family, ArtStation, Tumblr, FurAffinity / e621, Imgur, Flickr, and ~340 others).
  • Drops every image plus its source-side prompt into a local queue. Per-source dedup, no database.
  • Classifies each image with a vision-language model, multiple LM Studio instances for local, Groq for cloud, anything OpenAI-compatible — using a strict 17-field JSON schema, so you don’t get free-text replies you have to regex into shape.
  • Sorts the keepers into category folders next to their .txt prompt and a .vision.json audit record. Two score gates (overall quality + topic relevance) you tune in the UI.
  • Surfaces everything through a Flask + Alpine dashboard: start/stop, source toggles, gallery, prompt editor, ZIP export, per-source stats.

Two example use cases I actually used it for:

  • LoRA (300 images) & Finetune (100,000 images) dataset prep.
    • Give it a topic such as Female Influencer or {artist} style art
    • set AUTO_CAPTION_ENABLED=true if you want it to caption images or false if you want it to scrape images (and still store any found prompts from the posts it scraped from) and set whatever style prompting you want.
    • Walk away.
    • Come back to a folder of triaged images split by quality and category, each with a generated SD-prompt .txt next to it.
    • ZIP-export the filtered view straight into your trainer.
  • Ingesting a prompt-less archive. Point LOCAL_IMPORT_DIR at a folder of bare JPEGs (or paste a gallery-dl URL list)
    • Toggle off the prompt requirement, turn on auto-captioning.
    • Every image is classified and sorted, gets a SD-prompt / booru-tags / natural-language caption written by the same vision call that classifies it.
    • So you can train on a years-old archive without curating prompts by hand.

Links

Repo: https://github.com/tlennon-ie/cull
Screenshots: https://imgur.com/a/kSvsAW9

Roadmap is going to keep refining around what people actually use it for. On my list:
- more vision-worker backends
- Improved proper requeue UI
- a small headless CLI,
- Video scraping , classification etc

A few things worth mentioning:

- Vision worker is pluggable via a registry. Subclass BaseVisionWorker, register, done. Two LM Studio endpoints can run in parallel; there's a keepalive worker that pings every 15s if your local server has aggressive idle-unload, and an idle-unloader for when you want VRAM back.

- It ships with a Claude Code skill bundle in .claude/skills/ (cull-helper, lmstudio-vision, metadata-schema) and three sub-agents in .claude/agents/. If you use Claude Code, Cursor, Aider, Codex, or anything that respects those files, your AI assistant knows cull's load-bearing seams (categories, queue Protocol, vision-worker base class, the strict-output schema) before it touches anything.

- Self-updater is in: toast in the dashboard, click Update, pulls from origin/main and relaunches.

Stack: Python 3.10+, Flask, Alpine.js, Pillow, Playwright (for the X scraper), gallery-dl. Single machine. No Redis, no DB, no Docker required. MIT licensed.

u/Compunerd3 — 3 days ago

Sharing "cull" : my open-source dataset tool for image scraping & classification & captioning pipeline

open-sourced a tool I built and am maintaining called Cull.
It’s a machine curation engine for AI image datasets, the kind of work that eats hours every time you want to train a LoRA, build a reference library, or just classify an archive that isn’t a 100,000-file mess.

What it does, end to end

  • Scrapes from Civitai (.com and .red), X/Twitter, Reddit, Discord, plus any URL gallery-dl supports (Pixiv, DeviantArt, the booru family, ArtStation, Tumblr, FurAffinity / e621, Imgur, Flickr, and ~340 others).
  • Drops every image plus its source-side prompt into a local queue. Per-source dedup, no database.
  • Classifies each image with a vision-language model, multiple LM Studio instances for local, Groq for cloud, anything OpenAI-compatible — using a strict 17-field JSON schema, so you don’t get free-text replies you have to regex into shape.
  • Sorts the keepers into category folders next to their .txt prompt and a .vision.json audit record. Two score gates (overall quality + topic relevance) you tune in the UI.
  • Surfaces everything through a Flask + Alpine dashboard: start/stop, source toggles, gallery, prompt editor, ZIP export, per-source stats.

Two example use cases I actually used it for:

  • LoRA (300 images) & Finetune (100,000 images) dataset prep.
    • Give it a topic such as Female Influencer or {artist} style art
    • set AUTO_CAPTION_ENABLED=true if you want it to caption images or false if you want it to scrape images (and still store any found prompts from the posts it scraped from) and set whatever style prompting you want.
    • Walk away.
    • Come back to a folder of triaged images split by quality and category, each with a generated SD-prompt .txt next to it.
    • ZIP-export the filtered view straight into your trainer.
  • Ingesting a prompt-less archive. Point LOCAL_IMPORT_DIR at a folder of bare JPEGs (or paste a gallery-dl URL list)
    • Toggle off the prompt requirement, turn on auto-captioning.
    • Every image is classified and sorted, gets a SD-prompt / booru-tags / natural-language caption written by the same vision call that classifies it.
    • So you can train on a years-old archive without curating prompts by hand.

Links

Repo: https://github.com/tlennon-ie/cull
Screenshots: https://imgur.com/a/kSvsAW9

Roadmap is going to keep refining around what people actually use it for. On my list:
- more vision-worker backends
- Improved proper requeue UI
- a small headless CLI,
- Video scraping , classification etc

https://preview.redd.it/c36a5pftpd0h1.png?width=1581&format=png&auto=webp&s=f5ba80790fbff9c45258760b7a84179caed329a5

https://preview.redd.it/10465h2ypd0h1.png?width=1425&format=png&auto=webp&s=3b28f1a6f8b31f1cc5e97a0c8aa8f4af8d928be2

reddit.com
u/Compunerd3 — 3 days ago