u/Kahvana — reddlx

Hey everyone!

Not a native speaker so please correct me if I make mistakes.

Recently I had to migrate a character from an online AI to a local one. Since some others might go through the same journey, I wanted outline mine and show what worked for me and what didn't. Hopefully it's useful to you!

Background

I had a character card I really liked roleplaying with, that used DeepSeek v3.2.

However, on 2026-04-22 DeepSeek's API discontinued v3.2 replaced it with DeepSeek v4 Flash. It's quality simply couldn't match up with v3.2 and DeepSeek v4 Pro's pricing is too expensive for me once the discount will be gone. With no credit card nor crypto (thus NanoGPT and OpenRouter not being options), I had no options to run v3.2.

Since I do have a computer that can run Gemma4 31B and heard how good it was, I decided to give it a spin. I branched off a few points in the story to see responses in different scenarios. Gemma4-26B-A4B missed to much, but Gemma4-31B understood the assignment and had the "heart", but the quality wasn't there yet. There is a lot I had to improve but Gemma4-31B had potential.

Porting process

First I tried simple patch-up jobs by expanding system prompt and the character card with specific rules, but that didn't work.

Since I used to generate user-assistant pair summaries in "memories" lorebook using STMemoryBook in constant, I had far too much entries (1500 for 3000 messages). I redid my memories lorebook by generating them with v4 Pro and giving the last 7 entries as context; only 1 summary per full scene (~30 messages). I landed on 100 entries total. This worked quite a lot better!

Gemma4 31B seemed to take my character card quite literally, so I had to recreate it. I first had v4 Pro (inside chat.deepseek.com as "Expert" to preserve tokens) rewrite the card using past messages and the memories lorebook as example, but v4 Pro ended up leaning too much into the existing character card traits.

What finally ended up working for me is redo the card from scratch; don't include the card, only include the memories lorebook and selected chat messages from different scenarios. Have v4 pro analyze (behaviour/speech/patterns/appearance/traits/notables/events/etc, be specific!), and then use those summaries+lorebook+messages to generate a new character card.

To prevent heavy context use which degrades response quality, I started a new chat on chat.deepseek.com each time I wanted to make edits. It followed the pattern of: "Analyze this part of the card for what's good, that's factual, what's not factual, what could be improved, what should be removes, what should be updated. Don't fix, just analyze", and then telling it to fix the issues I found problematic.

The last edit was to slim down the card. DeepSeek v4 Pro has a tendency to duplicate instructions in various places. By reorganizing it and removing redundancy, it provided consistency that a smaller model needs.

The result

After all that work, the new memories lorebook and the recreated character card, my whole character functions as it did before. You can never get 100% accuracy since it's a different model, but it's genuine 98% there and damn impressive how well Gemma4 31B can embody the character.

No longer having worries for API costs is a real relief.

So yeah, the summarized process:

Generate a lorebook that has one summarized entry per scene using STMemoryBook. Use last 7 entries as context.
Select messages from a broad range of events / emotional ranges (happy/angry/sad/the kingdom falling/rebuilding after the war/falling in love/etc)
Generate very detailed analysis reports using DeepSeek v4 Pro, with only selected messages and a lorebook with summerized scenes. Be specific in your prompt, "give me all details" is too vague.
Use the reports + lorebook + messages to generate a new character card.
Refine the generated card using reports + lorebook + messages on new instances of DeepSeek v4 Pro each time you want to make an edit.
Finally remove duplication and trim it down with DeepSeek v4 Pro.

What specifically didn't work for me:

Don't expect a local AI to simply embody the cloud AI character. Your card is build around the nuances of the latter, so you need to adopt it to the former. That means giving it enough info with more specific instructions how to embody the character, without overloading context (no more than 8k permanent tokens on the card with a context of 128k. Double for 256k, etc).
Patch-up jobs don't work. They get verbose and redundant quickly, rebuild instead.
My user-assistant pair summaries simply don't work at 3000 messages (1500 summaries), it's too much. One per scene works.
Using the same DeepSeek v4 Pro instance for analysis + create the card + editing + refining is simply too much for the context. It may support 1 million context, but it degrades quickly after 256k with hallucinations and using wrong sections from past iterations. Once edit per instance worked for me.

I still have to experiment with running an embedding model. I'm using Gemma4's default parameters and talk over Chat Completion.

For preset, only thing edited is context (128k), response length (2048) and I've set system prompt to simply <|think|> instead of the default "write your next reply in this fictional roleplay" or akin.

There ya go!

After undergoing the full process, it makes me wonder, how do you port your characters from one model to another? Especially when migrating from cloud to local LLMs.