u/Fine-Discipline-818

You don't need a personal brand don't go for fomo, blindly

I’ve been in this industry for almost a year now, working with multiple CEOs through freelancing and jobs, and one thing I’ve realised is:

Not everyone needs a personal brand. And definitely not everyone needs one on LinkedIn.

I had a client, let’s call her Ella. The kind of work Ella does doesn’t really bring clients from LinkedIn. Maybe personal branding there could help her 10 years down the line, but right now her priority is getting clients and growing her business. Platforms like Instagram or Facebook would probably make much more sense for her audience.

But because LinkedIn branding is “the thing” right now, everyone keeps pushing it.

Please choose these services wisely. Before hiring someone, spend some time researching where your audience actually is. Experiment a little yourself too,there’s no harm in that. It’s better than blindly following trends and later ending up frustrated with the entire industry.

And also advising every single person to build a brand just for the sake of making money feels wrong. Not every platform works for every person, and pretending otherwise just to sell services is unfair.

reddit.com
u/Fine-Discipline-818 — 2 days ago

Founder of future agi got insecure?

So I was going through a new company launch post named bentolabs ai and saw the comment of the founder of future agi 'how you guys are different from us' and shocked would be an understatement like why would a founder comment this!!! Under a fucking launch post of ur competitor. Crazy world and they just replied with their result and boom

reddit.com
u/Fine-Discipline-818 — 4 days ago
▲ 4 r/aiagents+1 crossposts

Not talking about unit tests. Not talking about eval suites. Talking about the moment your agent does something unexpected on a real user run and you need to figure out why.

I've been running agents in production for a few months now and i've slowly developed a workflow that actually works for me, but it's ugly and i'm curious what everyone else does.

Here's what i've landed on: skim volume, don't deep-dive individual runs. When something feels off, i'll pull up like 100 recent trajectories and just... scan them. Fast. Not reading every step, just looking for patterns. One weird run is noise. The same failure showing up 3 times in a row? That's a real bug.

The other thing that's been surprisingly useful: read trajectories immediately after you ship a change. Like, 30 runs within 15 minutes of deploy. You'll catch if your change silently broke something adjacent way faster than waiting for user complaints. I caught a tool routing regression last week this way my prompt tweak for one tool somehow made the agent start preferring a different tool in unrelated flows. Would've taken days to notice otherwise.

But here's the thing. How are you actually debugging your agents when they behave weirdly in production? Because my approach doesn't scale at all. Doing this manually every deploy is brutal. Some weeks I keep up with it, other weeks I just... don't. And then we're flying blind until someone on the team notices something in user feedback.

I've been looking at tooling for this tried a couple observability platforms, most of them are fine for traces but don't really help with the "is this a regression from my last change" question. Recently started poking around BentoLabs which seems to actually think about this as a closed loop thing (detecting regressions, diffing behavior across versions) rather than just showing me more logs. Still early with it but the idea of getting alerted in plain english when behavior drifts is appealing vs my current "stare at trajectories and hope i notice" strategy. I don't think they gonna allow me to use it actually

Anyway curious what other people's flow looks like. Do you have something systematic or is everyone just vibing and hoping for the best? Especially interested if anyone's found a way to make post-deploy checks not feel like a chore.

reddit.com
u/Fine-Discipline-818 — 10 days ago

So I've been running a multi-agent setup with Claude for a few months now, mostly customer-facing stuff, some internal tooling. And I keep running into this problem that I think a lot of people here might be dealing with.

You ship a prompt change. Or you swap from Sonnet to Opus for one step in the chain. Or you add a new tool. And everything looks fine in your evals. You push it. Then three days later someone on the team notices the agent is subtly doing something wrong not catastrophically wrong, just...you can sense something's off. Maybe it stopped including a specific field in its output. Maybe it started being way too verbose in one branch of the logic. Whatever.

And then you're sitting there trying to figure out WHEN it broke, and whether it was your change or some upstream thing, and you're basically doing archaeology on your own system. Manually defining outputs, reading through logs, asking teammates "hey did you notice anything weird last Tuesday."

I've been thinking a lot about what the fastest feedback loop in agent engineering that almost nobody is running actually looks like. Because right now my loop is: ship change → wait for someone to complain → investigate → fix → hope I didn't break something else. That's... not great. That's like, pre-CI/CD era thinking applied to agents.

The thing is, traditional software has solved this. You write tests, you run them in CI, you get a red/green signal before you merge. But agents are so much messier. The outputs are non-deterministic, "correct" is fuzzy, and the failure modes are subtle behavioral drift rather than crashes. So most teams I talk to (including mine, honestly) end up relying on vibes. Does the agent feel like it's working? Cool, ship it.

What I really want is something that watches production behavior, notices when things drift from what's expected, and tells me before a customer does. Like, not just tracing I have tracing, it generates a ton of data that nobody looks at until something is already broken. I mean something that actually closes the loop. Detects the regression, connects it to the change that caused it, and ideally feeds that learning back so it doesn't happen again.

I've looked at a bunch of the observability tools out there Langfuse, LangSmith, etc. They're good for what they do but they still feel like they stop at "here's what happened" rather than "here's what went wrong and here's how to fix it." The closed-loop part is what's missing for me.

Has anyone here actually built a solid feedback loop for their Claude-based agents? Like, something beyond "run evals before deploy and pray"? I'm curious what your setup looks like whether it's homegrown or you're using something off the shelf. Especially interested if you're running agents at any kind of scale where you can't just eyeball every interaction.

Or am i overthinking this and everyone is just vibing their way through production lol

reddit.com
u/Fine-Discipline-818 — 11 days ago