r/aiagents

Anthropic essentially bans OpenClaw from Claude by making subscribers pay extra
▲ 13 r/aiagents+4 crossposts

Anthropic essentially bans OpenClaw from Claude by making subscribers pay extra

Starting today (April 4, 12pm PT), Claude Pro and Max subscriptions no longer cover OpenClaw or any third-party agent harness. You can still use them but you're now on pay-as-you-go "extra usage" billing or a direct API key.

Wrote a full breakdown covering: why the OAuth loophole worked, the prompt cache economics behind Anthropic's decision, the Peter Steinberger/OpenAI timing angle, and what your actual options are right now.

One-time credit equal to your sub cost is available but expires April 17 so if you're affected, act fast.

aitoolinsight.com
u/Secure-Address4385 — 1 hour ago

My AI agents live in a RPG world instead of a terminal

I run 5 agents on OpenClaw and got sick of managing them through logs and dashboards. So I built them a world. Each agent has a pixel character, a station, and they actually walk around the space. When enough unresolved issues pile up, they walk to a meeting point and hold a council session. Four different models debating what to do next, not scripted, each one reads the live system state independently.

In one session an agent pushed for cold outreach to close leads at 2am. Another one said that's a terrible look for an autonomous system contacting strangers while the operator sleeps. They ended up pivoting to an inbound strategy that none of them originally proposed. That was the moment it clicked for me.

Single HTML file, Node bridge, Phaser for the world rendering. Runs on a Mac Mini. Still early but it works and it changes how you think about what your agents are actually doing.

If you want to follow the build I'm on X as Kaiba655957

u/HMB94 — 2 hours ago

Evaluating Fine tuned LLM vs Enterprise LLM (GPT/Claude) for Marketing RCA - Framework Suggestions?

Hi everyone,

I’m working on a use case around RCA for marketing campaign performance, and we’re evaluating two different system architectures:

System 1: Enterprise LLM + RAG

- Model: GPT / Claude (API-based)

- Uses structured campaign data via RAG (SQL + semantic layer)

- no fine-tuning

- Relies on prompt engineering + retrieval

System 2: Fine-tuned Open Model + RAG

- Base model: evaluating models from Hugging Face

- Fine-tuned using LoRA on historical RCA cases

- Same RAG pipeline for campaign data

Our Goal:

We want to compare which system performs better in:

- Generating accurate RCA explanations

- Identifying key drivers (pricing, audience, channel, etc.)

- Producing actionable insights

My question:

  1. Does this comparison framework make sense, or are we missing a better baseline?

  2. What would be a robust evaluation checklist for this use case?

So far, we are thinking along:

- Accuracy of identified root causes

- Consistency across similar scenarios

- Business interpretability

- Latency & cost

Open Challenges

- Ground truth for RCA is subjective

- Multiple valid explanations possible

- Hard to quantify “quality” of insights

Would love inputs from folks who have evaluated LLMs in analytical/decision-support use cases.

Thanks!

reddit.com
u/Disastrous_Sock_254 — 2 hours ago

What broke when you tried running multiple coding agents?

I'm researching AI coding agent orchestrators (Conductor, Intent, etc.) and thinking about building one.

For people who actually run multiple coding agents (Claude Code, Cursor, Aider, etc.) in parallel:

What are the biggest problems you're hitting today?

Some things I'm curious about:

• observability (seeing what agents are doing)
• debugging agent failures
• context passing between agents
• cost/token explosions
• human intervention during long runs
• task planning / routing

If you could add one feature to current orchestrators, what would it be?

Also curious:

How many agents are you realistically running at once?

Would love to hear real workflows and pain points.

reddit.com
u/_karthikeyans_ — 2 hours ago

Portable is not just moveable. It has to be inspectable.

I spent some time reverse-engineering a repo I stumbled across, and the part I found most interesting was not that a workspace could be copied between environments.

The useful part was that, after looking at the layout, I could answer three concrete questions:

  1. Where does policy live?

  2. Where does runtime truth live?

  3. Where does memory live?

In this repo, those surfaces are physically separated.

workspace/<workspace-id>/ contains human-authored policy and operating intent:

AGENTS/md

workspace/yaml

workspace-local skills/

installed apps/

state/runtime.db contains runtime-owned truth:

sessions

bindings

queue state

turn_results

request snapshots

compaction boundaries

operator profile state

durable-memory governance metadata

memory/ contains readable memory bodies, but not as one generic bucket:

memory/workspace/<workspace-id>/runtime/ for operational projections

memory/workspace/<workspace-id>/knowledge/ for durable recalled knowledge

memory/preference/ for durable preference memory

What I like about this is that it makes the moved artifact inspectable by authority boundary.

AGENTS/md is policy, not truth.

runtime/db is truth, not policy.

And the durable memory bodies remain readable while the runtime keeps recall/governance metadata separately.

That seems like a better portability model than the usual "copy the state blob and hope."

I am not putting the repo link in the body because I would rather not have this mistaken for a promo post. If anyone wants the full code, I will put the repo in the comments so people can inspect it themselves.

reddit.com
u/Admirable_Insect_548 — 9 hours ago

Best resources for learning Multi-Agent Systems (MAS) and State Management?

I'm looking to transition into Agent Development. I’m comfortable with the "Hello World" of AI, but I want to get into the weeds of Autonomous Logic and State Persistence.

Specifically, I’m looking for:

Papers or books on Multi-Agent Systems that are actually applicable to modern LLMs.

Best practices for handling long-term memory (Vector DBs vs. Graph-based approaches).

Any open-source repos that have "clean" agentic architecture I can study.

reddit.com
u/Responsible-Job8166 — 2 hours ago

Tested 92 conversational agents from 23 different developers before production. Here's what actually breaks them.

Over the past few months I've been running stress tests on conversational AI agents before they go live — simulating adversarial synthetic customers across different profiles. 92 runs across 23 developers in different sectors (e-commerce, legal, healthcare, AgriTech, consulting).

The average score is 83.8/100. Which means roughly 1 in 5 interactions has a detectable failure that didn't show up in the developer's own tests. What actually breaks them (in order of frequency):

  1. Over-explaining (most common) The agent gives a 4-paragraph response when the user needed one sentence. In WhatsApp or any messaging context, that kills the conversation. Fix: one instruction line — "Max 3 sentences per message. No lists. No bold." Agents that add this jump ~6 points on the next run.

  2. Policy hallucination (most damaging) The agent confidently states return policies, deadlines, or guarantees it was never authorized to confirm. This shows up especially when the customer pushes back across multiple turns. The agent tries to be helpful and invents.

  3. Loop detection When a customer doesn't follow the script and keeps asking the same thing differently, some agents repeat themselves 3-4 times with slight variations instead of escalating or changing approach.

The pattern I keep seeing: developers test their own agents cooperatively. They know what the agent means, they don't push, they don't lie. Real users do all three.

Curious what testing approaches people here are actually using before deployment — A (evals), B (trace replay), C (synthetic simulation), or D (discovering issues in production)?

reddit.com
u/HpartidaB — 9 hours ago

i been using smaller models &amp; i no longer believe in anything over 500b parameters

also .. i think its important to recognize that it isnt exactly the best way to advance a skill by having a model hyperdrive your work for you .. something about taking smaller steps makes it better to learn what you're doing .. instead of trying to one shot prompt .. actually try to learn what you're working on

reddit.com
u/Helpful-Series132 — 17 hours ago
Lessons for AI Agent Development from the Claude Code Source Leak

Lessons for AI Agent Development from the Claude Code Source Leak

https://preview.redd.it/tvyw0hyvw3tg1.jpg?width=1536&format=pjpg&auto=webp&s=8ac0ea1f29b5f00c19229b1fef37ef0a3e9c4bbf

This post covers things that experienced developers probably already know, but since many people may not, I'm sharing this for informational purposes.

The Claude Code leak revealed a lot of information, but in my personal opinion, there are two key points to consider when developing AI agents.

  1. Understanding the AI Agent Loop

The agent's loop structure has been clearly revealed.

New Action -> Thinking -> Action -> Result Reporting -> Determine Next Action

This is how the loop runs, and the Streaming-first Principle was also confirmed.

However, this is the internal operating mechanism of Anthropic, the company behind Claude Code. From the user's perspective, it needs to be analyzed differently. Since users cannot modify the loop above, users should understand this "circulation mechanism" as a single unit.

For AI agent development, the approach should be. run one loop -> pause -> run one loop -> pause. repeating this pattern to work around the per-minute token call limits.

OpenClaw shows a similar pattern when performing repetitive tasks, and I have also implemented the same pattern in SandClaw.

  1. Dreams (KAIROS autoDream)

A feature called "dreaming" was revealed. The internal codename is KAIROS, and it contains a memory organization feature called autoDream.

The leak confirmed that Claude Code uses a 3-layer memory structure.

Layer 1 = MEMORY.md Index. A lightweight pointer that is always loaded into the context, roughly 150 characters per line.

Layer 2 = Topic Files. Actual knowledge is distributed and stored, loaded only when needed.

Layer 3 = Session Transcripts. Full session records, used for search purposes.

autoDream runs when the user is idle. It merges scattered observations, removes logical contradictions, and converts vague insights into verified facts. A 15-second blocking budget is also set so it does not interrupt the user's workflow.

This kind of feature is essential for AI agents. It allows the system to clean up unnecessary old memories, systematically store new memories, and maintain the overall memory logic. Only then can the agent respond accurately to the user's latest commands or questions.

I developed something similar. My memory logic works as follows.

L1 = 3day memory

L2 = 7day memory

L3 = 30day memory

L4 = permanent memory

Image memory

I also separated memory storage into N1, N2, N3 within the "Autopilot" system to distribute memory and enable unified search.

In my case, there is no server and the IDE runs directly on the user's PC. So even if the connected AI model changes, it accurately loads and searches the storage on the user's PC. If you have a server, you can build it on the server instead.

I developed all of this using a multi-layered RAG approach. I built plugins for brokers worldwide, enabled API calls and tick-level data retrieval, and made it possible for the connected AI to use 200 tools and directly develop, modify, and edit Python code. Because the system became complex, I built the logic described above. After seeing the Claude Code leak, I felt reassured.

"Ah, so my development direction wasn't wrong after all."

I wrote this post because some people may not understand why the Claude Code leak was such a big deal. These are two important points to consider when developing AI agents.

reddit.com
u/Fine-Perspective-438 — 13 hours ago
I got tired of unsafe AI agents, so I open-sourced my own
▲ 5 r/KotlinMultiplatform+2 crossposts

I got tired of unsafe AI agents, so I open-sourced my own

Most agent projects seem to compete on the same axis: who gives the model more access, tools, and freedom. I wanted to try the opposite approach.

Souz is a desktop AI agent built around security and simplicity. The goal was not to make an agent that can do everything. The goal was to make one that feels predictable, works out of the box, and does not rely on the usual pile of risky abstractions (like MCP).

Because we write our own tools, we control the agent from top to bottom. Any action we consider dangerous, such as deleting a file or sending a message, always requires explicit user approval. The agent cannot download binaries and execute them. It also cannot access anything outside the user’s `$HOME` directory. When implementing each tool, we think carefully about how to make it safe by design. And we now have more than 70 tools.

We built a lot of the stack ourselves, including the agent engine, and the whole project came from being unhappy with how casual the ecosystem has become about unsafe agent design.

It’s open source now, so feel free to poke around, criticize the architecture, or steal ideas.

I’d genuinely love feedback, especially from people who are also skeptical of the current agent hype cycle.

The links are in the comments.

u/dumch — 21 hours ago

how do ai agents turn a sentence into a whole website?

&#x200B;

i make simple sites for small shops on the side. takes me longer than i want. been looking at ai agents that say they can build a site from just one description.

I played with bunch of them like framer , wix , lovable, Readdy etc. i typed something in and it gave me a page with pictures and a contact form. the pictures were random so i would have to change them.

has anyone here actually used one of these for real work or anyone curious like me how they do all this stuff?

reddit.com
u/darkluna_94 — 8 hours ago

Title: Day 2 — the 5-agent AI architecture behind my LinkedIn tool

Day 1 stats: 407 unique visitors, 11 free signups, $0. Posted the recap yesterday.

Today I want to share the technical side because a commenter said “I could clone this in 4 hours.” Here’s why you can’t:

5 agents run in sequence on every post:

  1. ⁠Voice Agent (GPT-4o): analyzes your past posts, maps sentence patterns, vocabulary, hook preferences

  2. ⁠Emotion Agent (GPT-4o): determines what emotional angle fits the topic

  3. ⁠Generation Agent (GPT-4o-mini): writes the post using your voice profile

  4. ⁠Style Agent (GPT-4o-mini): formats for LinkedIn’s algorithm, line breaks, bold, hashtags

  5. ⁠Quality Agent (GPT-4o): scores authenticity (40%), voice match (30%), factual accuracy (30%). Rewrites if below threshold.

Total pipeline: 12-15 seconds. Runs on Cloudflare Workers. $0.03 per post in API calls.

Today’s build: adding one-click LinkedIn profile import via Apify (lmk isn’t there’s somehh throng better) so users don’t have to paste posts manually. That was the #1 friction point yesterday.

Would love technical feedback. What would you add to the pipeline?

kraflio with a com —————

reddit.com
u/Soft_Ad6760 — 14 hours ago

Run your own models, chain them into headless pipelines, or just message them as a Telegram bot. Each step its own personal API, billed by the second, idle costs nothing. Stop burning $300/day on frontier models for your agents. (Free Access)

Been building SeqPU.com for about a year and this community is exactly who it was built for. The $300/day burning on Opus for a 24/7 agent is a solved problem. You run your own model. You own the billing model now.

You write code, choose your hardware. CPU for almost nothing all the way to 2×B200 with 384GB VRAM. One click and you go from a lightweight CPU script to a nearly 400GB GPU rig. Billed by the second, idle costs nothing, model caches once and loads instantly across every project forever.

The pattern we keep coming back to is what we call the Cascade. A small focused model handles easy requests cheap. Hard ones escalate to bigger hardware automatically. Each step is its own published headless endpoint — callable, composable, chainable. Your orchestrator on CPU for almost nothing decides what fires and when. The GPU only wakes up when inference actually needs to happen.

When your notebook works you hit publish. One click makes it a headless API you can charge for. One click makes it a UI site anyone can use in a browser. Three steps makes it a Telegram bot with your name and your avatar answering from your phone.

Smaller intentional models on the right hardware consistently outperform huge generalist models for inference. This community understands the implications better than most and that puts you in a unique position to build agent pipelines that are cheaper, faster, and more reliable than anything running on frontier APIs.

Drop a comment if you want free credits to try it.

SeqPU.com

reddit.com
u/Impressive-Law2516 — 15 hours ago

Batch scraping and scheduling for agent data pipelines: what production looks like

Most agent discussions focus on the single-turn web lookup. But production agent systems often need to scrape hundreds of URLs on a schedule, know immediately when something fails, and get results as they come in rather than waiting for the full batch.

A few things we built at AlterLab specifically for this use case:

Batch scraping handles up to 100 URLs per request. Results stream back via SSE as they finish rather than blocking until the last URL completes. Failed items get auto-refunded and you can rerun just the failures without setting up the full job again.

Cron-based scheduling with per-schedule analytics. Success rates and spend trends over time. If your balance gets low, schedules pause instead of burning through your last dollars on jobs that are probably failing anyway.

Change detection monitors specific pages and fires a webhook when content changes. Semantic diff, visual diff, or structural diff depending on what you're watching. Most use cases that seem like "scrape this every hour" are actually "tell me when this changes."

No subscriptions, pay for what you use. alterlab.io

reddit.com
u/SharpRule4025 — 12 hours ago

Why web data quality matters more than scraping cost for agent pipelines

Most scraping tools return markdown or raw HTML. That's fine for traditional data extraction, but it's a real problem when you're feeding an agent context for tool calls.

A Wikipedia page in markdown has 400+ lines of navigation, language selectors, and sidebar links before the article content even starts. An e-commerce product page returned as markdown might be 20,000 tokens, where the actual product data you needed is 300 tokens.

Agents make this worse because they often call web tools multiple times per task. Each bad extraction compounds. By the time your agent has gathered context from 5-6 sources, a significant chunk of the context window is noise.

Structured extraction helps a lot. Price as a number field, not buried in a text blob. Body paragraphs separated from navigation. Headings as a hierarchy. The agent gets what it needs instead of the full page chrome.

We built this into AlterLab because we kept hitting the problem in our own agent experiments. Token savings are 80-95% on typical pages. alterlab.io if you want to try it.

What approaches are others using for web context in agent workflows?

reddit.com
u/SharpRule4025 — 12 hours ago

Why web data quality matters more than scraping cost for agent pipelines

Most scraping tools return markdown or raw HTML. That's fine for traditional data extraction, but it's a real problem when you're feeding an agent context for tool calls.

A Wikipedia page in markdown has 400+ lines of navigation, language selectors, and sidebar links before the article content even starts. An e-commerce product page returned as markdown might be 20,000 tokens, where the actual product data you needed is 300 tokens.

Agents make this worse because they often call web tools multiple times per task. Each bad extraction compounds. By the time your agent has gathered context from 5-6 sources, a significant chunk of the context window is noise.

Structured extraction helps a lot. Price as a number field, not buried in a text blob. Body paragraphs separated from navigation. Headings as a hierarchy. The agent gets what it needs instead of the full page chrome.

We built this into AlterLab because we kept hitting the problem in our own agent experiments. Token savings are 80-95% on typical pages. alterlab.io if you want to try it.

What approaches are others using for web context in agent workflows?

reddit.com
u/SharpRule4025 — 12 hours ago

Why web data quality matters more than scraping cost for agent pipelines

Most scraping tools return markdown or raw HTML. That's fine for traditional data extraction, but it's a real problem when you're feeding an agent context for tool calls.

A Wikipedia page in markdown has 400+ lines of navigation, language selectors, and sidebar links before the article content even starts. An e-commerce product page returned as markdown might be 20,000 tokens, where the actual product data you needed is 300 tokens.

Agents make this worse because they often call web tools multiple times per task. Each bad extraction compounds. By the time your agent has gathered context from 5-6 sources, a significant chunk of the context window is noise.

Structured extraction helps a lot. Price as a number field, not buried in a text blob. Body paragraphs separated from navigation. Headings as a hierarchy. The agent gets what it needs instead of the full page chrome.

We built this into AlterLab because we kept hitting the problem in our own agent experiments. Token savings are 80-95% on typical pages. alterlab.io if you want to try it.

What approaches are others using for web context in agent workflows?

reddit.com
u/SharpRule4025 — 12 hours ago
Week