u/DetectiveMindless652

Image 1 —
Image 2 —
Image 3 —
Image 4 —
Image 5 —
▲ 2 r/codex

Totally expecting to get roasted here, because its reddit. However, I built something based off my own experiences running agents, and for me its all about saving money, having better control and letting agents work together, rather in parallel.

This is fully local and allows you to basically monitor, store memories, debug and detect 9 different types of loop with email alerts and built in kill switch which is customisable.

In my mind this has been useful, and I am close to storing my 450k memory. However, other agent builders, is this something you would find useful, or is it overkill?

Peace people, and I appreciate any insight given.

u/DetectiveMindless652 — 5 days ago

Hi Folks,

Hope you are having a lovely day, and wanted to share what I have been working on as i really have tried to model it off what I struggle with agents in the past, ultimately coming down to cost and not knowing what the hell its doing when its bugging out.

Its not perfect, but I would really love to know what people are specifically struggling with agents, is it observation? memory? time debugging?

Let me know, as this would be superhelpful!

u/DetectiveMindless652 — 5 days ago
▲ 2 r/SaaS

I started a saas www.octopodas.com that is essentially AI Agent Observation layer with built in memory, shared memory between agents, advanced loop detection (stop burning money) audit trail (what your agents are doing)

I am a month or so into launch and have 200 users, (not that active) and 250 stars on github.

However, I cannot work out if i am peddling a dead horse or like it is something for every 5 comments i get saying this is great, i get one that says not needed.

My current customers like it.

however, i am feeling pretty low, like should i stop? I have given everything for this a year of hard work, and i am just so uncertain on what to do.

Sorry for the rant!

u/DetectiveMindless652 — 9 days ago

Hi Folks, been working on something for a good few months. I created via GPT researcher a compiled list of data of peoples complaints across this subreddit.

23% memory
11% Loop/Cost
9% Lack of accountability

Where commons ones for agents and decided to make a dashboard that has all these functions built in.

Its working pretty well, and people seem to be enjoying it.

My question is, is there anything else that you would add? or any other issues that are more prominent?

reddit.com
u/DetectiveMindless652 — 13 days ago
▲ 15 r/Agent_AI+4 crossposts

Hey folks, I've been running a small AI agent infrastructure product for a few months and I keep running into the same problem. It's not agents crashing. It's agents that work but waste money in really subtle ways. The kind of stuff that doesn't show up in error logs.

Like an agent that retries the same prompt on a more expensive model every time it doesn't quite get what it wants. So you go from gpt 4o mini to gpt 4o to gpt 4.1, get basically the same answer, and pay 25 times more. Or two coordinating agents fighting over the same shared key, where Agent A writes approve and Agent B writes reject and they just keep overriding each other forever. Or the model that keeps starting its responses with "actually, wait, let me reconsider" four times in a row on the same prompt, just burning tokens because someone left reflection mode on too aggressive. Or an agent that reads a key, writes back the same value with a tiny phrasing tweak, repeatedly, forever.

LangSmith shows you traces. Helicone shows you cost. Phoenix shows model drift. None of them catch patterns across calls, which is where most of the real waste lives.

So I built one that does. It runs 10 detection rules in real time on the audit trail and tells you which loop you're stuck in plus a copy paste fix.

There's three pages in the recording. The first is Loop Intelligence which shows actual detections firing on traffic from five simulated agents. Each one has the evidence behind it (which calls, which prompts, which costs) and a suggested fix. The second is the Audit Ledger which is a hash chained tamper evident trail of every agent action with cost, model, latency, and prompt hash. Useful for figuring out what the agent actually did at 3am. The third is Atlas which extracts entities and relationships from agent memory and shows it as a graph. Helps debug why an agent knows what it knows.

It also sends you an email when an agent has looped with an option to stop writes and diagnose and the other features:

  • Loop Intelligence. 10 real time classifiers for agent failure patterns (cost inflation, ping pong, self correction, polling, decision oscillation, recall write, retry storms, tool nondeterminism, reflection, clarification)
  • Audit Ledger. Hash chained tamper evident trail of every agent action with cost, model, latency and prompt hash
  • Atlas. Entity and relationship graph extracted from agent memories, visualised in 3D
  • Memory Explorer. Browse, search and full version history for every agent memory
  • Circuit Breaker. Auto pause agents that exceed your spend rate, with email alerts and per agent thresholds
  • Dedup Guards. Prevent agents from rewriting near identical values to the same key
  • Recovery. Snapshot and restore any agent's state to any prior point
  • Performance. P50, P95, P99 latency on every endpoint, per agent
  • Analytics. Token usage, cost trends and agent activity over time
  • Apply Fix. One click execution of suggested fixes from any detection
  • Framework integrations. LangChain, CrewAI, AutoGen, MCP and OpenAI Agents wired in out of the box

Can you let me know which problems you suffer with and which ones you think are not neccessary?

It also has built in real time agent analytics, memory (boring I know) and shared memory which i like, so agents can read each others memories.

It is a work in progress, and not perfect but I would love to hear peoples feedback, this sub has been awesome for support, and if you do not like it, and think its terrible let me know why it is just as useful.

if you fancy checking it out

www.octopodas.com for cloud

https://github.com/RyjoxTechnologies/Octopoda-OS for local users!

once again thanks for the support folks!

u/DetectiveMindless652 — 13 days ago

I decided to make this off my personal experience, and likely many others. This post is not written with AI so forgive me if it's not very coherent, nor will I invent a made up story to shill it lol.

Here's an overview without rambling.

Firstly loop detection. When your agent rewrites the same thing too many times, retries the same broken API call, or escalates from cheap models to expensive ones for no reason, it catches it. Shows you exactly which writes were too similar, when, and what they cost. One click to clean it up, reversible for 7 days.

Secondly safety rails. Stops your agent from saving the same memory twice. Set per agent so different agents can have different rules. Define the similarity threshold (default 85%) and the key pattern, and writes that match get blocked at the API. Useful when an agent gets into "wait let me think again" mode and floods your store.

Thirdly the cost kill switch. Per agent dollar per minute thresholds. Your customer support bot might be fine at $0.50/min, but your overnight research agent should hard stop at $0.05. When an agent crosses its own threshold, that one specific agent auto pauses while others keep running, and you get an email naming the agent and the spend that tripped it.

Then a cool Obsidian style memory graph. Real time view of every memory your agents wrote, every decision they made, every goal they had. When something goes wrong you scroll back and see exactly what happened.

Lastly every event your agents take, memory writes, decisions, plan changes, pauses, resumes, gets logged in a tamper evident chain per tenant. If anyone edits history after the fact the chain breaks and the system catches it. Useful if you're in healthcare, finance, legal, or anywhere a customer might one day ask "did your agent really tell me X?" and you need to prove it.

Plus real time analytics on agent activity and built in memory (boring I know).

This is a work in progress and something I have been grinding on for like 6 months. I hope you like it, and if you don't, please let me know why and how I can improve it, really important to me lol.

I appreciate the support, on the whole this community is awesome and super supportive.

u/DetectiveMindless652 — 16 days ago