
r/BlackboxAI_

Is it only me who is really frustrated with AI hallucinations?
For example, Gemini randomly generating made up facts while writing an analysis out of nowhere or Claude making random numerical calculations or chatgpt providing citations that don't even exist?
I also noticed the lower the model version, the high chances of hallucination.
I'm so frustrated with this. Do you guys feel the same?
Vibe coding is basically a stimulus package for senior backend devs
everyone is terrified that vibe coders using UI agents are going to replace engineers. i’m seeing the exact opposite.
we just got hired to rescue a project built by a non-technical founder who 'vibe coded' the whole thing in cursor and Blackbox. the frontend looks like a $50m startup. the backend? it’s making 45 separate database calls per page load, state is completely fractured, and the auth tokens are stored in plain text in localStorage.
we are charging them 3x our normal rate to rip out the backend and write it properly while keeping the shiny ai frontend. 'vibe coding' isn't replacing senior devs (def not yet), it's just generating an infinite pipeline of highly funded tech debt for us to clean up
Altman compares AGI to the ring of power from Lord of the Rings
Why does this endpoint only slow down when multiple users hit it?
I had an endpoint that looked completely fine in isolation. Locally it returned in under 200ms every single time, even with a decent amount of data behind it. Nothing in the logic felt expensive either, just a couple of joins and some filtering before sending the response back.
The problem only showed up when I tested it with concurrent requests. Suddenly the response time would spike unpredictably, sometimes jumping past two seconds without any clear pattern. What made it confusing was that CPU usage stayed relatively normal, and the database didn’t look like it was under obvious stress either.
At first I assumed it was a database bottleneck that I just wasn’t seeing clearly. I spent a while tweaking indexes and simplifying queries, but none of it really changed the behavior under load. Single requests were still fast, and concurrent ones were still inconsistent.
That’s where I brought the whole flow into Blackbox AI, not just the endpoint but also the service layer and the connection handling. Instead of asking it to optimize anything directly, I had an agent walk through what happens when five requests hit the endpoint at nearly the same time.
The interesting part was how it mapped out the execution. It showed that each request was triggering the same sequence of calls, but they were all competing for a limited number of database connections. That part wasn’t surprising on its own, but what I hadn’t noticed was that one of the intermediate steps was doing a blocking transformation on the result set before releasing the connection.
So even though the query itself was fast, the connection stayed occupied longer than expected. Under concurrency, that created a queue effect where requests weren’t waiting on the database query, they were waiting for connections to free up after unnecessary processing.
I used the iterative editing inside Blackbox to move that transformation step outside the connection scope and then re-ran the same simulated concurrent flow. The difference was immediate. The agent showed that connections were being released almost right after the query completed, which reduced the waiting chain between requests.
After applying the change, the endpoint behavior under load finally matched what I expected from the beginning. The response times stabilized, and the spikes disappeared.
What made this tricky is that nothing looked wrong when testing normally. It only became visible when multiple requests overlapped, and even then the issue wasn’t where I initially focused. Without stepping through how the system behaved under concurrency, it just felt like random slowness.
The Data Was Correct Until It Was Sorted
The numbers didn’t look wrong at first.
The API returned the right dataset, the counts matched expectations, and nothing was missing. But once the data hit the UI, something felt off. Items that should have been at the top were buried somewhere in the middle, and the ordering changed depending on how often the page refreshed.
It wasn’t random, but it wasn’t predictable either.
My first assumption was that the backend wasn’t enforcing a sort properly. I checked the query and confirmed it was sorting by a timestamp field. Then I logged the raw response before it hit the frontend. The order was correct.
So the issue had to be happening after that.
I suspected the frontend next. Maybe a state update was reordering things unintentionally. I walked through the rendering logic, but nothing stood out. The array was being passed directly into a sort function with a comparator that looked straightforward.
Still, the output didn’t match the input.
I pulled both the backend response and the frontend sorting logic into Blackbox AI and used it to trace how the data was being transformed step by step. Instead of looking at the sort in isolation, I followed the full lifecycle from API response to rendered list.
That’s when something subtle came up.
The timestamp field being used for sorting wasn’t consistent in format. Some entries were ISO strings, others were already parsed Date objects. The comparator function assumed everything was the same type, so under certain conditions, it was comparing strings lexicographically instead of comparing actual time values.
That explained why the order looked “almost right” but occasionally wrong.
I had seen the data structure before, but I hadn’t questioned the consistency of the field itself.
Using iterative edits in Blackbox AI, I normalized the timestamp field before sorting so everything was converted into a consistent numeric value. Then I reran the same transformations through the agent.
This time, the order stayed correct no matter how many times the component re-rendered.
Nothing was wrong with the sort function.
It was doing exactly what it was told. Just not with the data it expected.
are we officially done with local RAG for small-to-medium repos?
two years ago, setting up a local RAG pipeline using chroma or pinecone was the only way to get an LLM to 'read' your codebase.
Now, with 1m+ token context windows becoming the standard on claude and blackbox, you just drag and drop the whole /src folder into the prompt, the model just reads the whole thing in raw text in seconds. are any of you still maintaining local RAG setups for repos under 100k lines of code, or have massive context windows nearly killed the need for vector databases in standard web dev workflows?
Setting up a dev environment is st.ill weirdly painful in 2026- why?
​
Every time I help someone get started with coding, the first 2 hours are jus...installing things. Wrong Node version. Missing PATH variable.
A command that works on Mac but breaks on Windows. It's exhausting and it kills momentum before they even write a line of code.
So I've been thinking about a tool that fixes this.
You pick your OS, pick what you're building (web dev, backend, ML, DevOps, etc.), and it spits out a clean, ready-to-run install script - with the exact right commands for your platform.
No Googling. No Stack Overflow rabbit holes. Just copy, paste, done.
Think: a guided wizard that generates a personalized setup script. Windows gets winget commands, Mac gets Homebrew, Linux gets apt/curl - all handled automatically.
Would something like this have helped you when you started out? And for those of you who mentor or teach others - is this a real pain point you keep running into?
Honest feedback welcome.
Is this worth building properly?
Is the 'Model-Independent Governance' dream dead in Big Tech? They seem to prefer walled gardens.
I submitted partnership proposals to major technology firms regarding the development of an innovative, model-independent governance layer in my capacity as an external consultant. However, I have yet to receive a response. It is possible that including a professional NDA to protect intellectual property at such an early stage was premature, any ideas how do this effectivly?
The state looked correct until it re-rendered and then everything broke
I was working on a React flow where a form submission triggered a background sync, nothing unusual until I noticed the UI would briefly show the correct updated state and then snap back to the previous values. It wasn’t random either, it happened only when the network request took just long enough to overlap with another state update.
At first it felt like a simple stale state issue, but logging everything made it worse because the logs showed the right values at every step. The mutation finished, the state setter ran, and the component re-rendered with the expected data. Then a second render would quietly override it with something older that shouldn’t even exist anymore.
That’s where it stopped being obvious. The timing didn’t line up with a typical async bug. It felt like something else was writing to the same state, but tracing it manually across multiple hooks and effects was getting messy fast.
I pulled the whole component and its related hooks into Blackbox AI and started with an agent asking it to simulate the render cycle step by step instead of just reviewing the code. That changed things immediately because instead of pointing out syntax or patterns, it started mapping when each effect fired relative to the async call.
It highlighted something I had completely ignored. One of the effects depended on a derived value that was recalculated on every render, and that effect triggered a fallback state update whenever it detected what it thought was an “incomplete” sync. The problem was that during the async window, that derived value briefly matched the fallback condition even though the real update was already in progress.
What made it tricky is that this only happened when two renders overlapped in a very specific order. Manually I kept assuming React would batch things in a predictable way, but the agent simulation showed a different sequence where the fallback effect ran after the correct state was already set.
I used iterative edits inside Blackbox to test a few variations, first by stabilizing the derived value, then by guarding the effect with an additional condition tied to the request lifecycle. Each time I adjusted something, I had the agent re-run the render reasoning so I could see if the sequence still broke.
The fix ended up being a small change, but not one I would have trusted without that step by step breakdown. I introduced a ref to track whether a sync was actively resolving and prevented the fallback effect from firing during that window. Once that went in, the second render stopped overwriting the correct state.
What stood out wasn’t the fix itself, it was how misleading the behavior was when looking at logs versus actual render timing. Without walking through the execution path the way Blackbox did, it just looked like React was behaving inconsistently when it really wasn’t.
this shows how ai fills gaps with confident assumptions
Use Ai to review my code before pushing
Before pushing anything now, I usually run my code through AI and ask if there’s anything wrong or inefficient.
It often catches small stuff like edge cases, bad naming, or unnecessary complexity that I might overlook.
sometimes it even suggests cleaner or simpler approaches i wouldn’t have thought of in the moment.
it’s not perfect and i don’t blindly trust it, but as a quick second opinion it’s been surprisingly useful.
feels like a lightweight code review without needing another person every time.
curious how others are using it
do you rely on ai for code reviews too, or do you think it creates bad habits in the long run?
Using AI feels like having a pair programmer who never gets tired
The biggest difference for me has been consistency.
i can be stuck at 2am, ask a question, and get help instantly without breaking flow.
no waiting, no digging through threads, no losing momentum for hours over something small.
it’s not perfect and still needs review, but it removes a huge amount of friction from coding.
feels less like getting help and more like having someone always there to unblock you.
curious how others feel about this
does it actually improve your workflow or does it sometimes slow you down / create more confusion?
I got curious which AI agents actually broke out in 2026. They all did the same thing — subtracted something
Slides 1-6 in order.
The 5 agents in the matrix:
- Paperclip (github.com/paperclipai/paperclip)
- Edict (github.com/cft0808/edict)
- MimiClaw (ESP32 firmware)
- Steve (github.com/YuvDwi/Steve)
- TEMM1E (github.com/temm1e-labs/temm1e)
TL;DR: every one of them got interesting by giving up something the mainstream insisted on keeping. The giving-up is the design.
I need your feedback
Hey guys, I've been using this AI for building a codex remote control for Android, as there is no better way to use codex on your Android phone other than some complex stuff that you have to go through and download. Something that is very easy to do that very easily connects your codex to your Android phone and you can use it anywhere. The best part is that I am also building it in a way that there are only two modes that I'm building:
- When your PC is completely turned on and you can use it for free
- When your PC is turned off and you can basically go from your mobile, add in prompts and everything, even when your PC is turned off
That has appeared, and that is one that I am thinking of adding, because it already takes a lot of money to build this app version that will be super easy to install and use on your Android phones. I am thinking of building this. I am actually building it for myself right now, but if you want one, let me know in the comments and I will be scaling and building it for you guys too.