u/LKN_Pratim

I combined FLUX Fill with ControlNet for structured inpainting
▲ 17 r/FluxAI+1 crossposts

I combined FLUX Fill with ControlNet for structured inpainting

I've been experimenting with FLUX.1-Fill-dev lately and kept running into the same wall: the Fill model is great for mask-based edits, but there's no built-in way to feed it a ControlNet signal (depth, canny, pose, etc.) at the same time.

So I built one.

The idea is simple:
FLUX Fill handles the mask-based edit, while ControlNet guides the structure using inputs like depth, canny, pose, tile, blur, gray, or low-quality conditioning. This makes the inpainting more controlled, especially when you want the generated object or edit to follow a specific structure or composition.

Since FLUX.1-Fill-dev was not originally trained jointly with ControlNet, this is more of an experimental/community implementation. In practice, it works well for structured inpainting, but results depend a lot on the mask quality, control image alignment, and conditioning strength.

Links

Code example

    import torch
    from diffusers import FluxControlNetModel
    from diffusers.utils import load_image
    from pipline_flux_fill_controlnet_Inpaint import FluxControlNetFillInpaintPipeline
    
    dtype = torch.bfloat16
    device = "cuda"
    
    controlnet = FluxControlNetModel.from_pretrained(
        "Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0",
        torch_dtype=dtype,
    )
    
    fill_pipe = FluxControlNetFillInpaintPipeline.from_pretrained(
        "black-forest-labs/FLUX.1-Fill-dev",
        controlnet=controlnet,
        torch_dtype=dtype,
    ).to(device)
    
    img  = load_image("imgs/background.jpg")
    mask = load_image("imgs/mask.png")
    ctrl = load_image("imgs/dog_depth_2.png")
    
    result = fill_pipe(
        prompt="a dog on a bench",
        image=img,
        mask_image=mask,
        control_image=ctrl,
        control_mode=[2],                    
    # canny=0, tile=1, depth=2, blur=3, pose=4
        controlnet_conditioning_scale=0.9,
        control_guidance_start=0.0,
        control_guidance_end=0.8,
        height=1024, width=1024,
        strength=1.0,
        guidance_scale=50.0,
        num_inference_steps=60,
        max_sequence_length=512,
    )
    
    result.images[0].save("output.jpg")

If you find this useful, a GitHub star ⭐ would really help support the project.

u/LKN_Pratim — 2 days ago