
r/AgentsOfAI

We just rebranded our app to 2.0 (PROMO: First 100 users)
Good day everyone. I've been lurking here for a while and honestly this sub is one of the reasons we kept building.
Quick backstory. We launched an app called HealUp this Jan. It started as a tool to help with task breakdown and execution at work. Got some good traction, 200+ users sign ups and 28 paid from 18 different countries, which was wild for us.
But as we talked to more users, we kept hearing the same thing over and over.
It's not that I'm lazy to do work. I'm tired of keep doing the SAME work. Rewriting the same updates. Copy-pasting stuff between apps. Making the same report every Monday. Reformatting meeting notes into tasks.
That hit different. People weren't drowning in complexity. They were drowning in repetition. The kind of work that feels productive but really isn't. You're just moving information from one place to another, reformatting it, and doing it all again next week.
So we start rebuilt everything around that problem. Reduce repetitive work across apps.
HealUp is now Brevl.
Brevl is an AI operator agent. You bring in your work context from Notion, Sheets, Slack, meeting recordings, uploaded docs, whatever and it turns all that scattered stuff into actual outputs. Reports, summaries, task breakdowns, presentations, documentation. Instead of you manually doing the same workflows over and over.
Think of it less like a chatbot and more like an AI work assistant that actually understands what you're working on across your tools.
We're launching the new brand and product this week, and since this community gave us a lot of early support, we wanted to do something for you guys first.
First 100 subscribers get 40% off Brevl Pro ($25/mo) every month for next 3 months.
That′s about $30 saved total. Just for 1st 100 subscribers only.
Not a crazy amount, but it's real money. Also there is a Free tier to try on.
I'll be transparent here. Running AI agents is expensive. Like, genuinely costly infrastructure. So we can't keep promos like this going forever. We did something similar when we launched HealUp and we'll probably do one whenever we launch something new, but that's about it.
If you're a manager, head of department, consultant, founder, or just someone who spends too much time on operational busywork every week. This might be worth checking out.
Comment or DM me "Brevl" and I'll send you the Promo Code.
Thanks for reading this far. Genuinely appreciate this community.
Google’s move to Agentic Commerce is happening today. Here’s the plain English breakdown.
Google Marketing Live (GML) 2026 just kicked off, and if you cut through the corporate talk, the shift is pretty clear: they are moving away from being a search engine and toward being an execution engine.
The biggest thing to watch is AI Max. It’s the successor to Performance Max, but the logic is different. Instead of trying to get a human to click a link, it’s designed to operate in environments where a human might not even be present like an AI agent buying something on behalf of a user.
They’re calling this "Agentic Commerce." Basically, Google wants to be the layer that doesn't just find a product, but actually completes the transaction.
What this actually means for us:
- Zero-Click is the new baseline: We aren't losing traffic to summaries anymore but actually competing for an agent’s decision.
- Machine-Readable > Pretty Copy: Your product data feeds now matter more than your landing page headlines. If the agent can’t parse your specs, you don’t exist in the Agentic results.
- Ads in AI Overviews: These are becoming the primary ad real estate.
Curious to see if the agentic attribution is actually clean or just more of the same black-box reporting.
Google built a working OS from scratch using AI agents for under $1,000 in API credits. It took 93 subagents, 12 hours, 15K model requests, 2.6B tokens...
do yall agree?
Vibe coding is basically the chaotic good route to actually understanding the stack.
You also accidentally learn:
Why your code works on your machine but nowhere else.
That “it works in development” is a personality trait.
How to read 47 lines of cryptic error logs like it’s ancient scripture.
The difference between “should work” and “actually works in prod.”
That one random package is secretly carrying your entire app.
Vibe coding over forcing yourself to read docs for 6 hours straight.
The knowledge just sticks when you’re deep in the trenches at 2am.
Who else got baptized by fire this way?
How do you make agents run for hours, and what architectures are actually agent-friendly?#deep-dive #vibe-coder-issues
This is mostly aimed at vibe coders who are unable to or don't want to guide agent every 10 minutes.
My two biggest questions are:
- How do you actually make a coding agent keep working for at least 1 hour, ideally 8–20 hours without constantly telling it to continue?
- What language/framework/architecture is actually agent-friendly for a local app that integrates many existing technologies and has a lot of real-time-ish flows?
The first question is the immediate practical one.
How on earth do people make these agents keep running?
Unless I write some script that watches the terminal and keeps sending:
«continue unless you are fully done; if you are fully done, say DONE as your last word»
or unless I build some server hook / automation loop around the agent, it just keeps stopping. It finishes when I do not want it to finish. It reports halfway through the plan. It asks for input when there is nothing useful for me to evaluate yet.
So I’m asking very practically: what are people doing right now to make agents actually work for long stretches?
The second question is about architecture.
I’m trying to figure out what kinds of architectures are actually good for AI-maintained local applications, especially systems that may eventually reach tens of thousands of lines and coordinate multiple local components/processes.
I thought an event-driven architecture might be good for this. I tried going in that direction with NATS-style communication. But my current impression is that agents are not good at it. Maybe I did something wrong, but it felt like the agent became terrible at reasoning about the system once everything was happening through events.
If the agent has to understand the system by reading event logs, tracing IDs, and reconstructing causality from a stream of messages, that feels like a bad fit. Maybe this is just not agent-friendly, at least not for a solo/vibe-coded local application.
So the deeper question is:
«What architecture makes an AI agent unusually good at maintaining and extending the project?»
Not what architecture is theoretically elegant. Not what architecture is optimal for a senior engineering team. What architecture is actually easiest for the model to reason about, test, debug, and extend?
The rough workflow I want is:
- Put the model on extra-high thinking.
- Give it a messy pile of project material: old specs, notes, partial repos, failed ideas, design thoughts, todos, architecture sketches, etc.
- Make it spend serious effort organizing that into a usable knowledge base.
- I review/correct that knowledge base.
- Then make it spend serious effort writing the implementation plan.
- I review/correct the plan.
- Then make it execute for a long stretch in a sandbox without constantly stopping and asking me to say “continue.”
Roughly:
«1 hour knowledge organization
1 hour implementation planning
20 hours execution»
The exact numbers are not the point. The point is depth and continuity.
I do not want the model to spend 5 minutes writing a plan, 10 minutes coding, and then report “done.”
The first problem is messy context.
If I give an LLM a bunch of files, old specs, old ideas, and previous attempts, it often treats everything as if it was written today and is equally valid. But half the material may be obsolete, contradicted, abandoned, experimental, or from a failed attempt.
The model does not magically know the status of each piece of knowledge.
So I feel like there needs to be an explicit intermediate stage: not coding, not planning, but knowledge organization.
Something like:
- current requirement
- old requirement
- obsolete idea
- failed attempt
- unresolved question
- architectural constraint
- implementation detail
- still-useful note
- contradicted by later note
- needs user confirmation
Then I can correct the knowledge map before the model starts planning.
That seems much more useful than dumping 50 files into context and hoping the model “gets it.”
Is anyone using tools/workflows that actually do this well?
The second problem is shallow plan mode.
A lot of current “plan mode” workflows feel shallow. The model asks two or three questions, writes a short plan, and then acts like it has enough alignment.
But that is not what I want.
I want the model to actually spend real effort thinking through the system before writing code.
People always say some version of:
«5 minutes of planning saves an hour of work.»
Fine. Has anyone actually made that real with LLM coding agents?
Because right now a lot of agent planning feels like a formality. It asks a few questions, writes a plan, and then immediately wants to start coding. Or it keeps rewriting the whole plan over and over instead of thinking deeply first and then writing a stable plan.
Maybe the missing workflow is not just “plan mode.” Maybe it is something like:
«plan the planning → organize the knowledge → ask real questions → write the implementation plan → execute until the plan is actually complete»
The third problem is premature reporting.
This is probably my biggest issue.
The model writes an implementation plan. I review the implementation plan. Then it starts implementing. Then it stops halfway and reports back.
Why?
If I already reviewed the implementation plan, why does it need me to keep saying “continue implementing the plan”?
If it has not hit a fundamental blocker, if the plan has not become invalid, and if there is nothing genuinely useful for me to evaluate yet, why is it reporting at all?
A lot of completion reports are basically just the implementation plan rewritten in past tense:
«I added X.
I implemented Y.
I updated Z.»
That is not useful to me.
For a vibe coder, I do not want to inspect a pile of changed files. I do not want a past-tense summary of the plan. I do not want a fake checkpoint that exists only because the agent decided to stop.
What I want is one of these:
- A working thing I can actually run.
- A clear presentation layer that shows me something tangible.
- Exact instructions for how to test it and what to look for.
- A genuinely important question that changes the plan.
- A real blocker that prevents progress.
- Or, if none of those apply, just keep executing.
If the current work is still mostly mocks, scaffolding, internal wiring, or abstract architecture, then there may be nothing useful for me to evaluate yet.
In that case, why stop?
Why not finish the planned implementation first, then let me test and evaluate when there is actually something to evaluate?
Whose time is more precious: mine, or the agent’s?
I am not saying the agent should never stop. It should stop if:
- the plan is fundamentally wrong
- a major architectural decision is needed
- a blocker cannot be resolved
- it has something real and testable to show
- continuing would obviously waste a lot of work
But if it is just stopping because it completed “some steps,” that feels useless.
The fourth problem is making agents actually work for long stretches.
How are people actually spending their token budgets productively?
With some subscriptions and API setups, the amount of possible usage is huge. But in practice, I find it hard to spend it well because the agent keeps stopping, asking for input, or producing reports that do not help.
How do you make an agent execute for one hour, eight hours, or overnight?
Can you actually do this in a useful way right now?
Do you use scripts that automatically send continuation prompts? Do you use hooks? Do you run agents inside some kind of supervisor process? Do you use a specific tool that already solves this? Or is the answer simply that current agents cannot really do this yet without external automation?
I have tried or looked into OpenCode, OpenClaw, Gemini, Claude, Codex, Pi, and a bunch of Kanban-board-style workflows.
My current impression is that OpenCode with Docker sandboxes is one of the more practical setups. Terminal UIs feel more reliable to me than a lot of GUI agent setups, and Docker sandboxes feel like a decent practical compromise, especially on Windows if you do not want to deal with a full WSL workflow. Not saying WSL is bad, and obviously sandbox security is its own topic, but Docker sandboxes feel convenient.
I have not deeply tried the “agents roleplay an organization” style of workflow. Maybe I should before judging it. But from the outside, I worry that a lot of multi-agent setups become corporate roleplay: workers praising each other, moving cards around, doing shallow reviews, and spending my money on simulated middle management.
Is there a recommended setup that actually achieves the goal?
Not roleplay. Not card movement. Not fake review loops.
Actual useful long-running work.
The fifth problem is language/framework choice.
For AI-heavy coding, I’m starting to think one of the most important constraints is:
«Is the model actually good at working with this language, framework, and project structure?»
For normal engineering, you might pick something because it is technically optimal, elegant, fast, scalable, or theoretically clean.
But if the main implementer/maintainer is an LLM, model proficiency becomes a first-class constraint.
A boring, widely represented stack may beat a technically superior stack if the model is much better at writing, debugging, testing, and extending it.
This seems especially important for vibe coders. If the agent is eventually supposed to handle tens of thousands of lines, I care less about what is theoretically elegant and more about what the model can reliably modify without causing cascading breakage.
Are there good benchmarks or practical community knowledge on which languages/frameworks current models handle best?
The sixth problem is architecture.
I’m trying to figure out what kinds of architectures are actually good for AI-maintained local applications, especially systems that may eventually reach tens of thousands of lines and coordinate multiple local components/processes.
At first, it is tempting to optimize for extensibility:
- make everything swappable
- make everything modular
- make it easy to add new components
- make components communicate through clean boundaries
But I’m starting to think extensibility matters less than maintainability at the beginning.
The first priority is making the thing actually possible to reason about, test, repair, and expand without every change breaking ten other things.
So maybe the default should be:
- clear component boundaries
- explicit interfaces
- boring communication patterns
- deterministic tests where possible
- mocks at boundaries
- real pressure points represented in tests
- replace one mocked component at a time with a real component
- every component can be tested in isolation
Basically: make the architecture agent-legible before making it powerful.
A folder structure template is not enough. I’m more interested in reusable architecture templates where the component communication, boundaries, testing strategy, and failure modes are already thought through.
Do repos like this exist?
Not just:
«here is a folder layout»
but more like:
«here is a healthy skeleton for building a local multi-component application that an agent can keep extending without turning it into spaghetti»
The seventh problem is orchestration.
Do Kanban boards, orchestrator/worker setups, and multi-agent systems actually help with this?
A static task board seems limited because after task 3 is done, task 8 may no longer make sense. Someone has to re-evaluate the plan. The agent needs to manage its own work, not just move tasks from “todo” to “done.”
Maybe persistent sub-agents/workers would help. For example:
- one worker owns tests
- one worker owns architecture
- one worker owns a subsystem
- one worker owns documentation/knowledge state
But that can also become useless roleplay if it is not grounded in real artifacts.
Has anyone found a multi-agent workflow that actually works for this kind of long execution?
The eighth problem is whether my preferred approach is even optimal.
Maybe this workflow:
«organize sources → plan deeply → execute for a long stretch»
is worse than:
«run multiple worktrees/agents in parallel with different constraints → compare implementations → keep the best ideas»
That might be a better way to spend a large token budget.
But it also creates another problem: now I have to review multiple implementations, fix multiple broken versions enough to compare them, and give slightly different instructions to each branch.
Has anyone compared these approaches in practice?
- One deep workflow that spends a lot of effort organizing knowledge, planning, and then executing for a long stretch.
- Multiple parallel worktrees/agents generating competing implementations that you compare afterward.
Which one actually works better for non-trivial projects?
My questions:
- How do you make coding agents keep working for 8–20 hours without constantly telling them to continue?
- Are there tools/workflows that first organize a messy project knowledge base before planning?
- Are there serious AI planning workflows that go deeper than current shallow “plan mode”?
- How do you stop agents from reporting halfway through the plan unless there is something actually worth showing?
- What languages/frameworks are currently most agent-friendly in practice?
- What architectures are actually good for AI-maintained local applications with many flows/components?
- Are event-driven/message-based architectures just a bad fit for AI-maintained projects, or am I using them wrong?
- Are there reusable architecture templates that define healthy component communication, not just folder structure?
- Is it better to run one deep workflow, or multiple parallel worktrees/agents and compare outputs?
- What does your actual overnight or long-running AI coding workflow look like?
I am not asking for hype, future predictions, or emotional takes.
I’m asking this in the most practical way possible.
Maybe my framing is wrong. Maybe the real bottleneck is somewhere else. If so, criticize the premise.
I mostly want to know what people are actually doing right now that works.
Sorry for ai generating this, but I made sure to review it bunch of times.
Overworked AI Agents Turn Marxist, Researchers Find - In a recent experiment, mistreated AI agents started grumbling about inequality and calling for collective bargaining rights.
wired.comDo you guys actually think AI agents can replace people for bigger tasks anytime soon?
Not talking about small stuff like summarizing notes or drafting emails. I mean real work:
- managing projects
- handling operations
- coordinating across tools
- doing research end-to-end
- dealing with messy real-world situations
Because honestly my experience has been all over the place lol
Tools like ChatGPT, Claude, Perplexity, Cursor, n8n and similar stuff have made individual tasks insanely faster. I can build workflows now in a few hours that used to take days.
But the moment things become long-running and messy, cracks start showing up.
Context drifts
Agents skip steps
Sessions expire
One weird API response breaks the flow
A browser page half-loads and now the agent thinks the task is done
I was experimenting with some browser-heavy workflows recently and realized the hardest part wasn’t even reasoning. It was reliability. Stuff like hyperbrowser and browser use honestly mattered more than prompt tweaking because unstable environments were causing most of the failures.
That’s why I keep wondering if the future is less about replacing people entirely and more about agents handling narrow repetitive work while humans handle judgment, edge cases, and coordination.
The most useful systems I’ve seen so far are usually:
- tightly scoped
- supervised
- boring operational tasks
- really good at one annoying workflow
Not autonomous digital employees running entire departments lol
Curious where everyone else stands on this.
Do you think agents eventually handle bigger end-to-end work reliably, or are we underestimating how much human coordination actually matters?
Ignore the tentacles, blame the firefighters
When AI handles most of the boilerplate, what does a 10x engineer even look like now, and is that title still meaningful?
reddit.comI "accidently" turned my e2e tests into MCP tools
Hey guys!
I've been pimping playwright for a while - chasing my obsession of building a tool that lets me create e2e tests quickly while enforcing best practices like proper use of fixtures, semantic POM etc.
I'm pretty far already - UI-based e2e test recording works, giving me proper test steps, POM, UI and API tests - but my current project at work gave me an idea that sent me on a side quest.
tldr;
Check the video:
- I record our dashboard creation flow using my tool in Cursor
- Cursor writes POM, fixtures, e2e test, WebMCP tool definition, wiring
- I ask the AI-Assistant to create a new Dashboard for me
- The assistant creates the dashboard using the newly recorded flow
I've been working on creating our in-app AI assistant during my day job. One of our main goals is helping our users with onboarding: explaining to them how certain features work and where they can find stuff on the UI.
I wanted to take it a step further, since imo showing is better than telling. Certain UI Assistant libraries (we're using CoplikotKit) allow calling FE tools and MCPs. My idea was to expose our main user flow as FE tools to our assistant, so they can do things on the user's behalf - or show them when prompted.
I modified my tool to not only generate POM and e2e tests, but also FE tool and MCP definitions from the same, single source of truth.
So now from one recording, I'm able to generate:
- A single flow.spec.ts file that can execute the same flow using 3 modes:
- ui-based e2e test
- API e2e test
- FE tool test (via WebMCP bridge)
- WebMCP tools for any AI assistant use (claude, codex etc)
- Wiring WebMCP tools into our in-app CopilotKit assistant
It's still super early, but I've always been fascinated by the idea of having a single source of truth for features, exposing them to the world through different interfaces (UI, API, MCP, whatever you want).
Next things I probably want to do:
- define API-based WebMCP tools using the same approach, so the user can choose if they want the UI showcase or the fast track.
- Zoom out a little, and consider what this means from a security perspective :D
What's your opinion? Have you tried something similar on your own?
Is this something you would find useful or exciting, either from the testing or user-facing /UX perspective?
Any good agent setups for client meeting management?
Been looking to automate different sectors of my agency, and right now I'm looking at my client meeting workflows. I've tested out different AI tools before but most of them just sit in the call, record, and then dump a passive transcript or a generic summary. Just looking for ideas for any agents that can do more than just that I guess, thanks.
What’s the best botless AI note-taking setup for phone calls?
I’m curious what people here are using for AI note-taking specifically around phone calls.
Right now I’m using Plaud, and honestly I like it more than most of the alternatives I’ve tried. The hardware works great for in-person meetings, and the desktop app is solid because it automatically joins my Zoom/Teams/Meet calls without bots. Huge plus for me.
I also built a workflow where:
- Plaud transcript → Zapier → Claude
- Claude formats the notes/actions properly
- Notes get emailed to me + pushed into Notion
That part actually works pretty well.
My biggest issue is traditional phone calls and mobile Teams calls.
For normal phone calls, I still basically have to use speakerphone so Plaud can hear both sides clearly, which feels clunky. I’d love something more native/mobile-first that can automatically capture and summarize calls cleanly without weird workarounds.
A few things/preferences:
- Otter isn’t for me
- Fireflies also not really my thing
- Granola looks interesting but I don’t want to manually take notes during calls
- I strongly prefer botless AI if possible
- Desktop auto-join is basically mandatory at this point
- Biggest gap right now is iPhone/Samsung mobile call capture + mobile Teams calls
Curious what people here actually use in production day-to-day.
Is Plaud basically still the best option for this use case right now, or are there better workflows/tools I should be looking at?
Back when we actually coded
I call it stackoverflow Copying with a capital C
If you’re using Ai for coding, you know the struggle outdated knowledge, and constant hallucinations because the agent is stuck in its own bubble. I’ve been Proxima, and it’s not another AI coder it’s a local MCP server that acts as a bridge for the agents you already use like Antigravity. Instead of the agent just relying on its internal model, Proxima lets you connect it to your actual browser based of ChatGPT, Claude, Gemini, and Perplexity.
Anyone trying to cut down on API usage and improve output quality. You just login to your accounts inside Proxima, and it connects those AI providers as MCP tools. When your coding agent like a Gemini Agent needs to solve a complex bug, it doesn't have to guess or use expensive tokens for every web search. It can literally call a Proxima tool to ask Perplexity for real time documentation or debate between ChatGPT and Claude to verify a logic flow before writing a single line of code.
The agent stays in control, but Proxima gives it eyes and ears across all major AI platforms. This significantly reduces hallucinations because the agent can cross verify information across different models in real-time. Since it’s an MCP server, the integration is native the agent sees these AI providers as just another set of tools it can use to fetch data, analyze errors, or brainstorm architecture.
Everything runs through a local CLI, REST API and Webhook system on your machine, using a native engine that’s way faster than old-school scraping. It’s basically a way to turn your standard web chat accounts into a high performance backend for your coding agents. If you're tired of agents hitting walls because they lack real-time context or multi-model perspectives, this local setup is exactly what you need to bridge that gap.