
How do you stop ConversationBufferMemory from re-injecting full tool outputs every turn?
Hey r/LangChain,
(Disclosure: I'm not a native English speaker and have dyslexia, so I used an LLM to clean up the wording. Code, benchmarks and live API receipts are mine.)
I have a coding agent that re-feeds yarn.lock / pnpm-lock.yaml output into the prompt every turn. With stock `ConversationBufferMemory` I hit Gemini's `400 INVALID_ARGUMENT "exceeds 1048576"` after just 2 turns because every previous tool output gets re-injected verbatim. To prove this isn't a synthetic strawman, I ran a 6-turn agent on a payload built from two real public lock files — `facebook/react/yarn.lock`
(823 KB) and `vercel/next.js/pnpm-lock.yaml` (1.31 MB), ~2 MB / 1M cl100k tokens per turn and pointed it at Gemini 3.1 Flash-Lite. SHA-256 of both files + raw Gemini response bodies (HTTP 400 on the vanilla side, HTTP 200 on the deduped side) are in the PDF here:
https://github.com/corbenicai/merlin-community/blob/main/docs/benchmarks/langchain_2026-05-14.pdf
Curious how others handle this:
- Custom `BaseMemory` subclass that dedupes the rendered string?
- Switch to `ConversationSummaryMemory` and accept the LLM-as-summarizer
cost / latency?
- Manual `keep_last_n_messages` window (loses earlier context)?
- Move to checkpointed agent (LangGraph) and skip ConversationChain
altogether?
- Something else I'm missing?
What I ended up doing is a small `BaseMemory` subclass that strips byte-identical duplicate lines from the rendered history string before each LLM call (no summarization, no semantic compression just exact-line dedup, so it's deterministic). It inherits from `langchain_classic.base_memory.BaseMemory` so Pydantic validation in `Chain.memory` slots accepts it. When the underlying engine isn't available it transparently falls back to vanilla LangChain behavior with a one-line warning.
Result on the same 6-turn run: vanilla crashes turn 2, mine survives all 6. Same Gemini call returns 200. Code (MIT) + reproducible
benchmark script:
https://github.com/corbenicai/merlin-community/tree/main/integrations/langchain
Genuinely curious about other patterns people are using especially for very long-running agents where my 1-hour fallback retry might be too coarse.