u/Big_Impression_410

Built a self-contained PoC (using Claude Code) demonstrating memory poisoning against an AI agent with persistent vector memory.

The attack

An adversary with write access to the ChromaDB directory injects a crafted entry with realistic metadata (session_id, backdated timestamp, authoritative source tag). The payload is semantically close to queries the agent will receive, so it ranks at the top of retrieval results. The agent treats it as fact. No prompt injection. No jailbreak.

The hard part to detect

Nothing anomalous in the logs. The poisoned entry looks identical to a legitimate memory in retrieval output.

The PoC shows two mitigations

HMAC signing over content + metadata — unsigned entries rejected before reaching the LLM
Source scoping aka cross-session injections filtered at retrieval time

Stack:

ChromaDB, all-MiniLM-L6-v2 via fastembed (ONNX), pure Python stdlib for the HMAC defense. Runs fully offline, no API keys.

Blog post: https://mamtaupadhyay.com/2026/05/09/agent-memory-poisoning-demo/
Code: https://github.com/m-pentest/memory-poisoning-demo/
Demo Video: https://youtu.be/Pb46i3ZLK8g

Memory Poisoning AI Agents via ChromaDB