u/Aggressive-Math4027

Image 1 —
Image 2 —
Image 3 —
Image 4 —
Image 5 —
Image 6 —
Image 7 —
Image 8 —
Image 9 —
Image 10 —
Image 11 —

Hey guys,

I've been working on something for the past week and I'm surprised it doesn't seem to exist publicly anywhere.

The concept: a character card whose sole job is to sit alongside your RP session, read the chat context and latest narrative, and output a structured image prompt specifically framed as a POV shot from the protagonist's perspective. Basically an image prompt generator character.

The image prompt is not just a scene description, but an actual composited image prompt with:

- Shot type, distance, and camera height

- Body orientation derived from the narrative (who's facing who, from which side, etc.)

- Frame geometry: where each body part sits in frame, which of the POV character's limbs enter and from which edge

- Pose-specific overrides (carrying, straddling, back-to-camera, side-by-side, etc.)

- Some lookup tables since the text models cannot guess everything yet

The RP I write is two-character: a POV character (whose eyes = the camera, never in frame besides limbs and lower body if relevant) and a main female character he interacts with. So the output prompt always describes what the POV character sees, not what the scene looks like from the outside.

As for my image model workflow: one z-image turbo checkpoint (one with more realism and NSFW than the vanilla base one) and one character LoRA for body shape consistency. Images render in 4-6 seconds on my 5080 (I play ST on my Android so the image generation works through wifi/LAN, never liked ST on PC lol). I decided to black out the eyes directly in the prompt (just one extra line) since I couldn't get consistent faces. I've attached some example outputs (not the best since I clean my output folder pretty often).

I'll drop the full card in the comments if there's interest.

So, has anyone already done something like this and refined it further? I can't be the only one trying this, that seems like the typical use case for most RP with image generation.

Looking for prior art before I keep iterating in a vacuum. Happy to share and compare notes.

Cheers!

u/Aggressive-Math4027 — 9 days ago