Anyone else constantly re-teaching AI agents the same behavior?

You spend hours shaping an agent:

what tools it can touch
what it should ask before acting
what counts as risky
when it should stop and clarify

Eventually it mostly behaves.

Then the surface changes: new runtime, new coding tool, new MCP server, new workflow…

…and suddenly you're re-explaining the same expectations all over again.

Feels like a lot of this stuff currently lives in prompts, habits, and the operator's head instead of surviving across surfaces.

Curious how others are handling this.

Prompts? Policy files? Wrappers/hooks? MCP? Just accepting the drift?

reddit.com

u/rohynal — 4 hours ago

▲ 17 r/ReplitBuilders+13 crossposts

I’ve been working on a typing practice website ⌨️👀

Here’s the links 👇
https://www.typinglearn.com

I’m especially looking for feedback on:

UI/UX (is it intuitive or confusing anywhere?)
where it can be improved
Features you wish typing sites had
Performance / need to make it more responsive responsiveness
Anything that feels unnecessary or missing

Try it once !
https://www.typinglearn.com/games
https://www.typinglearn.com/map
https://www.typinglearn.com/community

u/Murky_Ad365 — 3 days ago

▲ 1 r/ReplitBuilders+1 crossposts

After 2 years of vibe coding I realised the AI builder isn’t the problem, your prompt is

I’ve been vibe coding for about 2 years now. Early on I made every mistake possible — jumping straight into Lovable/Bolt/Cursor with a half-formed idea, watching the AI confidently go in completely the wrong direction, then spending hours iterating trying to fix it.

Over time I figured out how to prompt properly. Now I rarely need more than 2 or 3 iterations to get something solid. The AI builder hasn’t changed — my input has.
The difference is almost entirely in what goes in at the start. Most people skip the thinking and go straight to building. That’s where the wasted time and money happens.

Curious if others have found the same — a few questions:
•How long do you typically spend prompting before you get something usable?
•How many iterations does an average project take?
•What’s the most frustrating part of the process?
•Have you found anything that helps — templates, frameworks, a certain approach?

Not selling anything, genuinely researching the problem. Would love honest answers — especially if your experience is “actually I don’t struggle with this at all.”

reddit.com

u/ButterscotchSevere96 — 2 days ago

▲ 3 r/ReplitBuilders+1 crossposts

I ran into a billing issue with Replit that I think is a documentation and UI problem, not just a user mistake.

I had set a budget limit of $0.01, but it didn’t actually save because Replit requires budgets to be set in $10 increments. Support told me that $0.01 “doesn’t meet the $10 minimum requirement,” so the system couldn’t save it as a valid budget. The problem is that the popup window and the flow I used did not make that clear, so from my perspective it looked like I had a budget limit in place when I didn’t. Looking at their documentation page, it does not say that anywhere that I can find. https://docs.replit.com/billing/managing-spend#set-up-limits-and-budgets

That matters because the result was about $70 in usage charges before I realized the cap was never active. If the system won’t accept values below $10, the interface should say so directly before the user submits it. Right now it reads like the budget was set, when in reality it was silently rejected.

I’m posting this because the current behavior is misleading. If Replit wants to prevent surprise charges, the minimum budget requirement needs to be obvious in the UI and in the docs, not buried in support replies after the fact.

https://preview.redd.it/q82m79n3spzg1.png?width=539&format=png&auto=webp&s=749b740920bc346764d62f946dfab5031f8b0f4d

https://preview.redd.it/gn58xi71spzg1.png?width=1231&format=png&auto=webp&s=d570fc85a7a2fcbeff79ef9be90998afd012904b

Has anyone else run into this with Replit budgets or usage limits?

reddit.com

u/Fearless_Author_6388 — 6 days ago

▲ 15 r/ReplitBuilders+1 crossposts

Hey everyone. I know Replit’s 10-year anniversary is tomorrow (May 2nd) and they are giving away service for a day. I’m posting this because I need to warn people before they attach a credit card to their accounts to keep using the AI agents after the promo ends.

I want to preface this by saying I have loved Replit up until recently. I'm a relatively heavy user and normally hovered around $1k–$2k a month in spend. I even used to buy the $1,000 credit packs in advance to save money because I relied on the product so much. Before the recent Agent 4 update, the AI was fantastic and well worth the money. You asked for a feature, it did the work, it stopped, and you moved on.

But the new version introduced this "plan-while-building" architecture where it spawns multiple parallel sub-agents on a Kanban-style board. When it first started happening to my projects, I was caught completely off guard. I approved the first few dozen tasks assuming it was just mapping out the steps to complete the small, scoped features I asked for. I didn't realize that approving one task gives the system permission to endlessly invent new ones.

I got caught in a sunk-cost loop. I was terrified that stopping it mid-stream would wreck my codebases and leave my architecture broken, so I thought if I just let it ride, it would hit 100% and stop.

It never stops. I literally tracked the stats on one of my projects: I asked for 23 things, and the system generated 770 tasks to deliver them. On another project, the daily amplification ratio was 11.3x. For every 1 task the AI actually merged, the platform auto-generated 11 brand new follow-up proposals.

I'm posting this constructively because I know Replit staff read this sub, and as a paying customer I need them to know there are some massive architectural flaws right now that are actively billing users for platform bugs:

Forced follow-ups: When an agent finishes a job, a built-in platform skill forces it to propose 1-5 new tasks. We can't opt out of this.
Billing for parallel collisions: Two agents running at the same time don't coordinate. They will propose the exact same follow-up task, and Replit charges you twice to build it. If they touch the same file, they cause a rebase conflict, and the platform forces the second agent to re-run all its code gen and testing just to fix a conflict that the platform's own merge ordering caused. And we pay for that compute. Replit's docs say this isnt the case, but I can promise it does indeed to tons of unnecessary work.
Paying to fix its own E2E pollution: The AI runs E2E tests, pollutes shipped data files with dummy data, merges it, and then the next agent spends paid time (and our money) reverting the pollution the previous agent caused.

I am completely stuck. My Git history is a minefield of hundreds of automated commits. I don't even know what commit I should revert back to because my legitimate feature requests are buried inside a $6,376 avalanche of auto-generated tasks (way beyond my usual monthly spend). My codebase is basically being held hostage by a system I can no longer afford to run.

The worst part is you can't even stop the queue. There is no bulk "Cancel All" button. The API only lets an agent cancel its own follow-ups, so once an agent exits, those tasks are stuck in the queue forever unless you sit there and manually click "Cancel" 130+ times while the active ones keep spawning more.

I've submitted detailed support tickets with commit hashes, task IDs, and logs proving this is a runaway platform loop and not user error, but I haven't heard back in days while my bill just sits there at over six grand.

If you are logging into Replit on May 2nd to take advantage of the free day, enjoy the sandbox. But the moment you attach a credit card for a real project after on May 3rd...:

Do not trust the sub-agents.
Never run them in parallel.
Scrub your task queue constantly.
Set hard budget limits immediately because the platform will not pause itself.

I'm really hoping someone from the Replit team sees this and can look into my ticket, because right now the Agent 4 loop is a runaway train. Be careful out there guys. I know there will be folks that are quick to criticize (its the Internet after-all) but mostly wanted to post this as a warning to others.

u/r3dditor — 13 days ago

▲ 2 r/ReplitBuilders+1 crossposts

I’m starting to think most “agent bugs” aren’t bugs. They’re mismatches between what we think we asked and what the agent thinks we asked.

That got me thinking about how we frame agent observability.

Most of the conversation treats the gap between what an agent claims it’s doing and what it actually does as a governance problem. Catch bad actions. Stop the agent before it deletes the wrong database.

That’s real. But I’m seeing something else.

A lot of developers are using the same idea for a completely different purpose: debugging their own assumptions about the model.

Examples I keep hearing:

Someone spent weeks debugging ranking issues, only to realize the prompt wasn’t being interpreted the way they thought.
Output drift that wasn’t a bug. The agent was doing exactly what it believed it was asked to do.
Instruction-following gaps where the agent technically followed instructions, just not in the way the operator expected.

In all these cases, the developer wasn’t catching the agent. They were catching themselves.

The most useful signal wasn’t the output. It was reconstructing:
what did I think I asked vs what did the agent think I was asking?

That makes me wonder if the “failure/incident” framing for observability is too narrow.

“Intent vs execution” might not just be for governance. It might be one of the most useful debugging primitives for everyday agent work.

Curious how others are handling this:

Are you debugging prompt interpretation / output drift by reconstructing the agent’s understanding?
What does that look like in practice? Logs, eval traces, reruns, something else?
Does “claim vs action” resonate here, or does it feel like the wrong vocabulary outside governance?

(For context, I’ve been exploring this space and built a small open-source tool around it. Happy to share if relevant, but mostly interested in whether this pattern resonates.)

reddit.com

u/rohynal — 8 days ago

▲ 22 r/ReplitBuilders+6 crossposts

latest patch includes newer camera orbit, movement, gem socketing, leg movement, cast time world teleport instead of instant, mouse scroll zoom in/out, world pvp and working dueling/multiplayer.

Workflow includes Replit (Heavy Claude UI at the moment) + Gemini for custom UI which I have ready made but want to ensure all functionality work before polishing the UI

If youd like to check it out let me know, I dont want the post to come off as spam with the link

u/Unfair-Frosting-4934 — 13 days ago

▲ 2 r/ReplitBuilders+1 crossposts

As a new builder and hobbyist, I came to Replit looking for a low barrier entry into creating my own apps. My current project is really exciting, but I’m worried I’ll have to pay more than the baseline $20 for core every month just to keep it running.

Does anyone have experience keeping their app running or recommend switching to another site builder? In context, I predict roughly 50-100 users per month (free use), and I also have a Google Maps API.

reddit.com

u/Distinct_Glove_6056 — 7 days ago

▲ 7 r/ReplitBuilders+2 crossposts

Hey everyone,

we built a simple scanner for people building apps with Replit, Cursor, Lovable, Bolt and similar tools.

It’s not a code review or a pentest. It’s more of a quick production check: security headers, HTTPS/TLS, GDPR basics, SEO, performance, exposed sensitive files and AI readiness.

The reason is simple: a lot of vibe-coded apps look finished, but before production they often miss the basic brakes.

You enter a URL and within a few seconds you can see what might be a problem:

https://grovetechai.com/

I’d really appreciate honest feedback. What would you want to see in a tool like this before shipping your app?

u/Lopsided-Werewolf805 — 9 days ago

▲ 3 r/ReplitBuilders+1 crossposts

Was wiring token tracking into our Governor and ran into something that's been bothering me.

If one LLM reasoning step produces three tool calls, and your observability stack attributes the same token spend to all three events, your downstream analytics are mathematically wrong. Not slightly wrong. Structurally wrong.

Concrete example from a single agent session I ran:

Naive event-level aggregation: 14,436 prompt tokens
Attributed correctly at the reasoning-step level: 4,812 prompt tokens
A 3x overstatement, silently, on one workflow

The fix is straightforward: every reasoning step needs an identity (we use llm_turn_id), and token spend attaches to the step, not to each downstream tool call. Aggregation becomes dedupe-safe by construction.

What's been bothering me more is the second-order implication.

In non-deterministic agent systems, the normal ways we think about correctness start breaking down. One of the things that starts replacing it is cost. Retries cost money. Loops cost money. Reasoning drift costs money. Every operational pathology shows up, eventually, in tokens.

Which means cost stops being just billing telemetry and becomes one of the few accountability surfaces that survives non-determinism. But only if the attribution is structurally correct. Otherwise you're not measuring agent behavior. You're measuring an artifact of how your trace events were aggregated.

Curious whether others are also starting to read cost as a behavioral signal rather than just billing, or if I'm reading too much into a single workflow.We found a 3x token attribution distortion in a single agent workflow

reddit.com

u/rohynal — 6 days ago

▲ 7 r/ReplitBuilders

I'm a high school founder at Techstars Startup Weekend in Boston right now. My team is trying to build the AI tool that actually solves distribution for small businesses and B2C founders. But before we write a single line of code, we're trying to hit 100 real conversations first.

If you hate being pitched to -- absolutely NO WORRIES. We're not trying to sell you anything. We literally have NOTHING to pitch -- which is exactly why I'm posting.

If you've ever built something and struggled to get it in front of actual users: what was the hardest part?

- Knowing what to post on social?

- Actually sitting down and making the content?

- Something you've never seen a tool address?

Comments are needed and welcome. I'll also be sliding into some DMs for 5-minute chats if you're open to it - just say the word and I'll come to you.

PS: if we win on Sunday, I'll send the first ten responses referrals to Techstars.

reddit.com

u/Crabbythrowaway1530 — 11 days ago