u/Ok_Alternative_3007

What's your pattern for managing AIs client state across a long session?

Working on something that makes a lot of API calls in sequence and running into the usual context management headaches.

Curious what patterns people use in Python or other language for this:

When do you decide to summarize vs truncate old conversation turns?
Do you manage message history yourself or rely on something else?
Any libraries you've found useful beyond the official SDKs?

Not looking for a framework recommendation necessarily, more interested in how people actually handle this in production scripts or long-running tools. The official docs are pretty thin on this.

reddit.com

u/Ok_Alternative_3007 — 24 hours ago

▲ 0 r/compression

Built a tool to stop paying twice for the same LLM tokens

Six months of heavy API usage and my bills felt higher than they should be. Finally sat down and traced exactly where the tokens were going.

Turned out most of it was repetition. Every API call resends the full context window, the whole conversation history, the system prompt, all of it. The context resets each call. You're paying for the same information over and over, every single request.

Built ContextPilot to fix it. It sits between your code and the API and compresses context before each call.

Saving around 60% on API costs at my usage level. MIT licensed, no account needed, works with OpenAI and Anthropic.

Still early, v0.2.2 on PyPI. Would genuinely appreciate feedback from anyone who gives it a try, especially on edge cases or integrations I haven't thought about.

github.com/msousa202/ContextPilot

contextpilot.org

u/Ok_Alternative_3007 — 4 days ago