u/TensorForger

I made a local real-time webcam stream instruct editor with Flux.2-Klein model and bunch of custom optimizations.

The project is called Flux Real-Time (FluxRT) and can run with 30 FPS on one RTX 5090. 4090 and 3090 cards are also supported.

Flux.2-Klein-4B is a small AI diffusion model that takes several images as "references" along with the prompt. The prompt is instruction, e.g. "This man is now wearing this jacket".

But Flux is an image model. Generation of a single frame takes about 0.4 seconds on 5090.

To make it run in 30 FPS several things were added:

  1. "Spatial-aware KV cache" that allows to recompute only small areas of frames where something has changed. This alone gives 1.5-2.5 speedup.

  2. Frame interpolation that also works in real-time (like DLSS) and just multiplies FPS by a factor of 4.

  3. Model compilation, shared memory buffers, multiprocessing, int8 quantization and other minor optimizations.

Gradio demo and some helpful scripts are already there.

https://github.com/tensorforger/FluxRT

u/TensorForger — 4 days ago
▲ 925 r/TouchDesigner+1 crossposts

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS

I have built a pipeline based on the Flux.2-Klein-4B model that allows processing of a video stream with low latency (about 0.2 seconds) on a single RTX5090 GPU.
It is free and open-source, you can try it locally:
https://github.com/tensorforger/FluxRT

Under the hood, it uses a custom spatial-aware KV-cache, so it only recomputes a small number of image tokens per frame, specifically where something is moving or changing.
It also uses frame interpolation with the RIFE model, which can multiply FPS by a factor of 2, 4, 8, etc. I have found that 4 is the most appropriate for my setup.

Depending on scene dynamics, the output stream achieves up to 50 FPS in mostly static scenes and around 20 FPS when the entire input image is changing rapidly. Benchmark results are in the repo.

There is also a Gradio demo, several minimal cv2 examples, and a simple paint-style app with real-time canvas updates.

EDIT: Thanks a lot for support! Added int8 quantization mode, so it would now run smoothly on RTX 4090 too with 20 GB VRAM in peak.

u/TensorForger — 1 day ago