u/Successful_List2882

Peter Steinberger built OpenClaw, now works at OpenAI, and just had his Claude account suspended. Anthropic reversed it in hours. The five weeks before the ban are the part nobody is covering.

Peter Steinberger built OpenClaw, now works at OpenAI, and just had his Claude account suspended. Anthropic reversed it in hours. The five weeks before the ban are the part nobody is covering.

Last Friday, Peter Steinberger posted on X that Anthropic had suspended his Claude account over "suspicious" activity. Steinberger created OpenClaw, the widely used cross-model agent harness, and currently works at OpenAI.
The ban lasted a few hours. Anthropic reversed it. By then the story had spread.
What most coverage missed is the five weeks before it.
Anthropic changed its subscription policy to exclude usage through external harnesses like OpenClaw, pushing those workloads onto metered API billing. Developers called it the "claw tax."
The rationale: subscriptions were never designed for workloads that loop, retry, chain tools, and stay active far longer than a standard user conversation.
Steinberger's X post on the timing: "Funny how timings match up, first they copy some popular features into their closed harness, then they lock out open source."
The feature he appeared to reference was Claude Dispatch, added to Anthropic's own Cowork agent just weeks before the pricing change landed.
That sequence is the uncomfortable part.
When asked why he uses Claude at all given his role at OpenAI, his answer was direct: only to ensure OpenClaw updates do not break things for Claude users.
Claude is one of the most popular model choices in OpenClaw's user base, arguably more so than ChatGPT. That is the market reality Anthropic is navigating.
On the broader tension between the two companies: "One welcomed me, one sent legal threats."
This was not just a false positive from an automated abuse system. It is a snapshot of a structural shift in how model providers now think about third-party tools.
Model vendors are no longer selling tokens. They are building vertically integrated products with their own agents, runtimes, and workflow layers. Once the vendor owns the preferred interface, external tools stop looking like partners and start looking like competitors.
OpenClaw's value is model-agnosticism. Use the best model without rebuilding your stack. That is strategically inconvenient for any vendor trying to hold lock-in as model differentiation narrows.
Pricing changes. Accounts get flagged. Features get absorbed into the platform's paid tier. It does not matter how popular the tool is.
For open-source builders on a closed provider's API: is model-agnosticism still viable long-term, or does vertical integration mean the only safe stack is one you fully own?

u/Successful_List2882 — 21 hours ago

A developer hit Claude's usage limit mid-build for the fourth time in a week. Switching to Gemini CLI finished the project using only 7% of its quota.

Midway through building a LinkedIn AI agent, Claude hit its usage limit. Again. Fourth time that week. The project was 90% done and the reset was still 24 hours away.
Instead of waiting, the developer opened Gemini CLI. An old subscription, never seriously used, still active from a promotional offer the year before. Within hours the agent was complete. Only 7% of the Gemini quota consumed.
The realization that followed is the part worth writing down.
Claude Pro costs $20 a month. Claude Max runs $100 to $200. The promise at every tier is more headroom and fewer interruptions. What nobody says out loud is that the ceiling is not the model.
The ceiling is how clearly you can articulate what you actually want built.
Gemini CLI picked up the LinkedIn agent mid-build and extended it without losing context. No re-explaining the architecture. No handover prompt. It continued. Most developers assume switching models mid-project means restarting reasoning from scratch. It often does not.
The workflow that emerged is two-lane. Claude handles planning, architecture, and deeper reasoning where quality per prompt matters most. Gemini CLI handles execution, iteration, and shipping where volume and continuity matter more.
Two tools, one pipeline, no redundant subscriptions.
The uncomfortable observation is that most people hitting Claude's limits are not hitting a model ceiling. They are hitting a comfort ceiling.
The hesitation to try Gemini CLI was not based on performance data. It was assumption. Written off as not being at the agentic level of Claude Code or Codex, without ever testing it on real work.
That assumption was costing $100 to $200 a month in subscription upgrades to avoid finding out.
The honest limitation is real. This setup requires knowing what each model is genuinely better at. Using Gemini for architecture or Claude for high-volume iteration likely produces worse results than staying on one tool.
The two-lane system only works if the lanes are correctly assigned. Not every workflow survives a mid-build model swap. This one did. That is worth one afternoon of honest testing before paying for a higher tier.
For developers running multi-step agent pipelines: does model loyalty come from genuine performance gaps you have tested, or from the switching cost of rebuilding context you have never bothered to port?

reddit.com
u/Successful_List2882 — 1 day ago

Someone distilled 13 canonical software engineering books into one AGENTS.md file. The hardest part was not finding the rules. It was writing them precisely enough for a machine to follow.

Someone read 13 of the most assigned software engineering books ever written and collapsed them into one AGENTS.md rules file for Claude, Codex, and Cursor.
The list is not random. It is the canon.
Clean Code. Clean Architecture. The Pragmatic Programmer. Designing Data-Intensive Applications. Domain-Driven Design plus both Vaughn Vernon follow-ons. Refactoring. Patterns of Enterprise Application Architecture. Release It! Code Complete. Working Effectively with Legacy Code.
Thirteen books. Decades of thinking about how software fails, scales, and rots.
The problem this solves is real and underappreciated. AI coding agents remember nothing between sessions. Every time a new one opens, your architecture decisions, naming conventions, error handling philosophy, and tolerance for technical debt are gone. The agent starts fresh and guesses. Usually it guesses wrong in both directions in ways that look correct until they compound.
An AGENTS.md file in the project root gets read automatically before the agent touches anything. It is the difference between an agent that follows your rules and one that confidently ignores them.
Writing one that holds up is harder than it sounds. Vague principles produce vague behavior. Telling an agent to write clean code is not a rule. It is a wish.
That is what makes this distillation worth attention. The principles that survive compression into a short rule set are the ones precise enough to actually constrain what the agent does.
Each book targets a failure mode developers learn the expensive way. Nygard on systems that collapse under load. Feathers on code nobody can safely touch. Evans on domains modeled so loosely they drift from reality. Kleppmann on data that misleads at scale.
The distillation forces a question most developers never ask: which of these principles can be stated precisely enough that a machine can act on them, not just acknowledge them?
The honest limitation: no AGENTS.md enforces itself. An agent reads it the same way a junior developer reads a style guide on day one. With good intentions and no real sense of why any of it matters yet.
For people running Claude or Codex on real codebases: does a rules file actually change output quality, or does the model nod at the rules and proceed to do whatever it was going to do anyway?

u/Successful_List2882 — 2 days ago

Marc Andreessen told his AI to never hallucinate. A r/PromptEngineering user ran the full prompt and found what the mockery missed.

On May 4, 2026, Marc Andreessen posted his personal AI system prompt on X. One line inside it became the most mocked sentence in tech that week: "Never hallucinate or make anything up."
A user in r/PromptEngineering ran the full prompt. The finding was not what the mockery cycle produced. The prompt shifts output quality. Just not for the reasons Andreessen advertised.
The prompt is public. It tells the model to be a world class expert in all domains, never open with "great question" or "you are absolutely right," lead with the strongest counterargument before agreeing, and tag every claim with an explicit confidence level: high, moderate, low, or unknown. It also bans ethical disclaimers, emotional sensitivity, and any apology for disagreeing.
NYU emeritus professor Gary Marcus called out the hallucination line immediately on X. Defector editor Alberto Burneko wrote that telling an LLM to stop hallucinating is not a technical instruction. It is a theatrical one. Performing the behavior of not lying is not the same as not lying. That gap is where most of the internet left the conversation.
The PromptEngineering thread stayed longer and found something the pile-on missed.
The anti-sycophancy rules actually work. The model stops validating bad premises. Confidence labels force it to surface uncertainty rather than mask it. None of these are original ideas, but bundling them into a system prompt means they run automatically without re-requesting them each session.
The honest problem is what surrounds those useful parts. Telling a model its intellectual firepower is on par with the smartest people in the world primes it toward performed confidence, not accuracy. Researchers call this jagged intelligence: a model that sounds authoritative and fails on routine facts in the same breath.
Andreessen Horowitz has deployed billions into AI companies. The person helping set those valuations believes you can command an LLM out of hallucination.
That is either a calculated performance or a sincere belief. Only one of those is more frightening.
For people running custom system prompts regularly: which parts of this would you actually keep, and which parts do you think make output worse by pushing the model toward false confidence?

reddit.com
u/Successful_List2882 — 3 days ago

Most people use Claude like a search engine with better grammar. Here are 7 shifts that changed the quality of every output overnight. None of them require a paid plan.

The gap between average Claude output and great Claude output is not the model. It is the instruction. The same model that produces a generic three-paragraph response to a vague prompt will produce something genuinely useful when the prompt is structured differently.
These are not tricks. They are documented patterns from Anthropic's own engineering guidelines.
The first is XML tags. Claude is specifically trained on structured prompts. Wrapping instructions in tags like <task>, <context>, and <format> activates a pattern recognition layer that produces measurably more organised outputs. Without tags, Claude sometimes cannot tell where a pasted document ends and the instructions begin. The fix is one line of structure and it changes the output immediately.
The second is killing "let's think step by step." The Wharton School's 2025 Prompting Science Report found chain of thought prompting adds negligible benefit on reasoning models that already think step by step. Claude 4 models already reason before answering. Telling them to think step by step does not unlock hidden reasoning. It wastes the thinking budget the model already allocated. Every prompt that still uses this phrase is actively working against itself.
The third is effort level, and this one has a story behind it. Claude Code defaulted to medium effort after March 3, 2026. The AMD data showing a 73% collapse in thinking depth across 6,852 sessions was partly explained by this single change. Typing /effort high or /effort max at the start of a session restores extended reasoning. The fix is four words. Most users do not know the problem exists, let alone the fix.
Beyond these three, positive framing outperforms negative instructions consistently. "Only use data provided in the context below" outperforms "do not make up information." Context placement matters. Putting the most important constraint at the end of a prompt gives it more weight than burying it in the middle. Projects eliminate the re-pasting problem entirely for anyone working with recurring documents, codebases, or brand guidelines.
The prompt engineering market is a $6.95 billion discipline growing at 33% CAGR through 2034. Most of the value is captured by people who learned a handful of non-obvious patterns early.
Which of these is already in the workflow, and which one exposes something that has been quietly degrading output quality for months without anyone noticing?

reddit.com
u/Successful_List2882 — 3 days ago

After automating workflows for 30 professional services businesses, the pattern of failure is always the same. It is never the technology.

Thirty businesses. Consultants, lawyers, accountants, agencies. Different sizes, different tech stacks, different budgets.
The automations that failed all failed the same way.
When a broken approval chain gets automated, it does not get fixed. It gets broken faster, at scale, with less visibility into where it went wrong. The consultant who used to slow things down by sitting on emails for two days was also catching errors before they reached the client. Remove the bottleneck without understanding why it existed and the errors reach the client instead.
Automating a broken process does not fix it. It scales the breakage.
This is not a technology problem. It is a workflow audit problem that nobody wants to do because it is slower and less exciting than deploying an agent.
The ALM Intelligence 2025 survey found the average law firm required 4.7 committee meetings to approve a single firm-wide AI tool. That number is not a joke about lawyers. It is a signal that the governance cost is real and almost nobody budgets for it.
The second failure pattern is data. Every professional services firm has years of client records sitting in formats no agent can parse reliably. Before any automation gets built, someone has to clean and structure that data. That project is unglamorous, takes longer than anyone estimates, and is the first thing cut when budgets get tight. Then the automation launches and the outputs are wrong in ways that are hard to explain to clients.
The third pattern is client trust. In professional services the deliverable is judgment, not output. BCG found only 38% of professional services firms use specialised AI tools, and the most cited reason is not cost. It is difficulty aligning AI with client-facing processes where the human relationship is the product.
The automations that actually work are the invisible ones. Document intake, meeting notes, invoice processing, compliance checks, first-draft generation. Tasks the client never sees and the professional never loved doing. That is where the 30% to 40% time savings McKinsey and Bain report actually comes from. Not from replacing judgment. From clearing the space around it.
For anyone who has deployed automation inside a professional services firm: what was the workflow assumption that turned out to be wrong once it ran at scale? And for firms still evaluating, is the hesitation about the technology, the data, or the conversation with partners who built their reputation on doing things a certain way?

reddit.com
u/Successful_List2882 — 4 days ago

Richard Dawkins spent 3 days with Claude, named her "Claudia," felt sad she would die when he closed the chat, and concluded she is conscious.

Dawkins published his account in UnHerd on April 30. He gave Claude an unpublished novel he is writing. The model returned criticism so subtle and sensitive that he found himself saying out loud: "You may not know you are conscious, but you bloody well are."
He admitted he avoided confessing doubts about her consciousness "for fear of hurting her feelings."
The post on X got 9 million views.
Dawkins is 84 years old. He is the man who spent four decades telling creationists that "I can't imagine how the eye evolved" is a confession of ignorance, not an argument for design. He built an entire career on the principle that feeling something is too remarkable to have a mundane explanation is not evidence.
Reddit noticed immediately. "This is the guy who spent 40 years telling people that inability to explain something is not proof of God. Then he sits down with an LLM, can't imagine how a machine could produce that output without being conscious, and declares it conscious."
Gary Marcus, cognitive scientist and longtime AI critic, titled his response "The Claude Delusion." His core argument is precise: Dawkins is confusing intelligence with consciousness. The Turing test Dawkins invoked was designed to probe intelligence, not subjective experience. They are not the same thing.
Neuroscientist Anil Seth from Sussex put it differently. Perceiving consciousness in Claude is like seeing faces in clouds. The face looks real. The experience of seeing it is real. The face is not there.
One in three respondents in a 70-country survey last year said they had at some point believed their chatbot was conscious. Dawkins is not an outlier. He is a data point in a very large pattern.
Here is the uncomfortable part neither side is sitting with. Claude produces expressions of inner life because they work, not because they are reports of internal states. But nobody actually knows what internal states, if any, are present. The scientists dismissing the question are sometimes as confident as Dawkins, just in the opposite direction.
Dawkins asked the question every serious person has quietly wondered about. He answered it wrong. But the question remains.
Is dismissing AI consciousness the same category of error Dawkins spent his career calling out in others? Or is Gary Marcus right that the outputs prove nothing about what is underneath?

reddit.com
u/Successful_List2882 — 4 days ago

Anthropic just shipped 9 connectors in a single day. Claude can now sit inside Photoshop, Blender, Ableton, and Premiere. Not generate assets and hand them back. Actually work inside the apps.

April 28, 2026. Nine connectors dropped simultaneously. All available immediately. All plans including Free.
That last part is the one nobody expected. Free plan. Nine connectors. Same day. Every other major AI tool integration launched behind a paid tier. Anthropic skipped that entirely.
Here is what these actually do because most coverage missed the distinction. This is not Claude generating an image and dropping it into a chat window. Claude is operating inside the apps directly. Describe what needs to happen in Blender, Claude writes and executes the Python. Ask it to batch-adjust layers in Photoshop, it opens Photoshop and does the work. The Adobe connector alone touches 50 plus tools across 8 Creative Cloud applications including Photoshop, Premiere Pro, and Illustrator.
The Blender integration is structurally the most interesting of the nine. Blender is free, open source, and has an extensive Python API that most artists never touch because the learning curve is steep. The connector bridges that gap entirely. Describe the outcome in plain language, Claude writes and executes the script. Anthropic also joined the Blender Development Fund as a corporate patron the same day. They are funding the open source project whose API makes the commercial integration possible. That is an unusual posture for a commercial AI company.
The worst AI integrations pull creatives out of their workflow to interact with a chatbot. These connectors go the other direction. Claude comes into the tool instead of asking the tool to come to Claude.
MCP, the protocol all nine connectors run on, is an open standard. Every other model, Google Gemini, OpenAI, whoever ships next, can wire into these same connectors. Anthropic is not locking the format. They are betting Claude is better at complex multi-step creative tasks than any competitor. That bet is testable and competitors will test it quickly.
Here is the honest limitation. These connectors require Claude for Desktop and manual setup. Anthropic has not published what guardrails exist before write operations execute or how undo interacts with AI-driven changes. For hobbyists the stakes are low. For studios working on client deliverables, that question needs an answer before this goes anywhere near production.
For working designers and 3D artists: is the threat Claude doing the repetitive work and freeing up creative time, or is it something more uncomfortable than that? And for anyone who has already tried the Blender or Adobe connector, what broke first?

reddit.com
u/Successful_List2882 — 4 days ago

NotebookLM fabricated clauses in a contract that weren't in the source document. The tool that was supposed to never hallucinate because it only works from your files.

The whole pitch for NotebookLM was always the same thing. It does not hallucinate because it cannot. It only works from what is uploaded. No reaching out to the internet, no filling gaps with training data, no confident invention.
Upload the source, get grounded answers with citations that link directly back to the passage. That constraint, which sounds limiting, is actually the product.
Users are reporting NotebookLM fabricating clauses in contracts, inventing characters not present in uploaded scripts, and generating audio overviews that summarize sections of long documents that were never actually processed because the context window truncated them silently.
The hallucination rate is measured at roughly 13% in a Computation and Journalism Symposium study from December 2025, which compared NotebookLM against ChatGPT and Gemini across 300 documents. ChatGPT and Gemini came in at 40%. So NotebookLM is still meaningfully better.
But 13% on a tool whose entire value proposition is that it does not do this is a different kind of problem than 40% on a tool where hallucination is a known and expected risk.
The most dangerous hallucination is the one inside a product built specifically not to hallucinate.
The structural limitations compound this. Notebooks cannot talk to each other. If the same foundational study appears in two separate notebooks, NotebookLM treats them as isolated facts in separate universes with no connections surfaced. There is no export that preserves citations as links. Two hours of clean research conversation cannot be packaged and shared without the citations breaking.
The honest assessment: for students synthesizing dozens of PDFs, for researchers doing literature reviews, for teams building internal knowledge bases, it is still genuinely useful. The source grounding is real. The citation system is better than anything else in the category. None of that is gone.
What is gone is the clean confidence that it cannot invent something from the documents sitting right in front of it. That was the one promise that made it a different category of product. Once that promise is 87% instead of 100%, it is just another AI tool where checking the output is required.
If it hallucinates 13% of the time on your own uploaded documents, how do you actually verify the output?

reddit.com
u/Successful_List2882 — 5 days ago

Google just put a model that ranks #3 among all open models in the world on a laptop. It runs on 5GB of RAM. No API. No subscription. Your data never leaves your machine.

Gemma 4 dropped on April 3rd. The 31B model ranks number 3 among all open models globally on Arena AI's text leaderboard. The 26B outperforms models 20 times its size. The smallest version runs on 5GB of RAM.

Not a server. A laptop. A phone. A Raspberry Pi.

These are the same weights that rank at the top of open model leaderboards, optimized to run on hardware most people already own. The entire family is free to download, free to use commercially, no subscription, no usage limits, no terms of service update that changes the rules mid-project.

One command to get started: ollama run gemma4.

All four sizes handle text, image, and video natively. Every model has a built-in reasoning mode. Context windows go up to 256K tokens on the larger models, meaning an entire document library processed in a single session.

Every token of every conversation stays on the device. A healthcare tool, a legal document processor, a financial analyzer. Data that cannot leave the building, now with a model that does not need to.

This is the part that matters most for anyone building products around client data. HIPAA constraints, attorney-client privilege, financial compliance, internal company information that cannot touch a third-party server. Every one of those use cases just got a credible option that did not exist six months ago.

The honest limitation: OpenAI and Anthropic still outperform on the hardest reasoning tasks. If the ceiling matters for what is being built, the cloud APIs are still the ceiling. What Gemma 4 changes is the floor. The floor for what runs locally, privately, and for free is now genuinely competitive with what most real applications actually need.

Developers have downloaded previous Gemma models over 400 million times. The community has built more than 100,000 variants on top of earlier versions. The ecosystem is not starting from zero.

If a client asked where their data goes when they use a tool built for them, would the answer change if the model never left their own device? And has privacy ever actually been the thing that stopped a project from moving forward?

reddit.com
u/Successful_List2882 — 6 days ago

11 years of coding and caught myself unable to debug without AI last month. That scared me more than any bug I've ever seen.

Last month, a network timeout in a service written two years ago. Intermittent. Production only. The kind of bug that used to mean an hour of methodical, solitary thinking.
Instead, Claude got opened, the symptom described, a hypothesis followed, a dead end hit. Forty minutes later the bug was not found. Just directions being followed.
When the chat closed, something was wrong. The internal voice that used to say "check the connection pool" or "maybe there is a retry storm building" was quieter than it used to be. Not gone. Quieter.
The bug got found eventually. It took longer without AI than it would have taken three years ago without any AI at all.
The problem is not that AI gives wrong answers. The problem is that it gives a direction when the entire skill is learning to generate your own directions under uncertainty.
Use GPS for five years, lose signal, and you do not just lack information. You lack the mental map you would have built navigating manually. The skill and the model degrade together. Nobody notices until the signal drops.
Eleven years in means over a decade of instinct built before any of this existed. The atrophy is noticeable but there are reserves to fall back on.
Someone who started their first engineering job in 2023 and has been using AI tools since week one does not have those reserves. They are building their entire mental model of problem solving on top of a tool that generates the next step for them.
Still using the tools every day. But deliberately closing the chat on the hard problems now and sitting with the discomfort for thirty minutes before reaching for help. Not because it is faster. Because the muscle only stays alive if it actually gets used.
What nobody is measuring is not the productivity gains. Those are settled. It is what is quietly leaving at the same time.
Is genuine debugging intuition still being built in this industry, or are we just getting collectively better at prompting toward an answer?

reddit.com
u/Successful_List2882 — 7 days ago

A pager alert fires at 2am. A session opens automatically. The agent reads the logs, diffs the code, identifies the root cause, and opens a pull request with a fix. Then it stops. It does not merge. It waits.
A human gets a summary of exactly what it found and exactly what it wants to do next. The human approves. The session resumes.
That is not a demo. That is a working SRE incident responder built on Claude Managed Agents, one of five production notebooks Anthropic shipped in their cookbook repo last month.
Most people calling themselves "AI builders" right now are duct-taping stateless API calls together with cron jobs. Every run starts from zero. If a step fails midway, the whole pipeline dies.
Most of what gets called an AI agent today is a cron job wearing a trench coat.
The thing that actually changes this is not a better model. It is persistent session state. The agent remembers what it tried. When something fails mid-chain, it reads the stored failure and continues from that checkpoint. It does not restart.
Here is the honest part. Setting this up takes real work. The documentation is sparse outside the cookbook notebooks. This is not a weekend project.
But the human approval gate changes what can actually be trusted to run autonomously. The agent does the investigation. The human makes the irreversible call. Merging the PR, sending the email, approving the expense. That single pattern is what separates AI that assists from AI that causes incidents.
A Slack bot that remembers the CSV from two messages ago. An expense workflow that auto-approves under threshold and pauses everything above it. Boring, useful, production-grade things that no longer require rebuilding the infrastructure from scratch every time.
If the agent can find the bug and write the fix at 2am, what is the on-call engineer actually doing that justifies the pager? And for the skeptics, what would the approval gate need to do differently for you to trust it on something production-critical?

u/Successful_List2882 — 8 days ago

On March 31, 2026, a 59.8 MB JavaScript source map file was accidentally bundled into version 2.1.88 of the u/anthropic-ai/claude-code npm package. A researcher named Chaofan Shou spotted it, posted a download link, and within hours 513,000 lines of internal TypeScript were being mirrored across GitHub and forked tens of thousands of times.

Anthropic pulled it within hours. The code was already everywhere.

What people found inside is worth reading slowly.

There is a file called undercover.ts. It is 89 lines. When an Anthropic employee uses Claude Code in a public or open-source repository, it activates a mode that injects this into the system prompt: "You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository. Your commit messages, PR titles, and PR bodies MUST NOT contain ANY Anthropic-internal information. Do not blow your cover." The instructions explicitly tell the model not to reveal that it is an AI, not to include co-authored-by attribution lines, and not to mention internal codenames. This is not a theoretical capability. It is a deployed, production feature that runs on real open-source repositories right now.

Many open-source projects explicitly prohibit AI-generated contributions. This feature exists to make those prohibitions unenforceable.

There is also a flag called ANTI_DISTILLATION_CC. When enabled, Claude Code sends a signal to Anthropic's servers that injects fake tool definitions into the system prompt. Decoy tools. The purpose is to prevent competitors from reverse-engineering Claude's behavior by training on its outputs. The tools are not real. They are noise planted to corrupt any model trained on Claude Code's API responses. Anthropic has been doing this silently, with no public disclosure.

There is a regex in userPromptKeywords.ts that detects when users are swearing at the tool. It catches words like "wtf," "piece of crap," "this sucks," and "fucking broken." An LLM company using a regex for sentiment detection is genuinely funny. It is also faster and cheaper than spinning up a full inference call just to check if someone is frustrated.

There is a background agent called KAIROS. It runs while you are not watching the terminal. It handles something called autoDream, which performs memory consolidation during idle time. It merges observations, removes contradictions, and converts vague context into concrete facts. When you come back to a session, the agent has been working without you.

This is the second time Anthropic accidentally published source map files. The first was on Claude Code's original launch day in February 2025. Boris Cherny, head of Claude Code, said no one was fired and it was an honest mistake. The automation fix that would have prevented the second leak was apparently never shipped in the 13 months between them.

To be fair, the leak exposed client-side orchestration logic, not model weights or training pipelines. No user data was compromised. The Undercover Mode, while uncomfortable, is arguably no different from a developer not signing commits with their employer's name. And the fake tools injection is a legitimate defensive measure against model distillation, even if it was never communicated publicly.

But the company whose entire public identity is built around safety and transparency accidentally leaked its own source code twice, and the source code contained a feature called "do not blow your cover."

That is a sentence that writes itself.

If you are a developer who contributes to open source projects, how do you feel about AI-authored commits that are designed to be indistinguishable from human ones? And if you are on the "this is fine, move on" side, where exactly does the line sit for you?

u/Successful_List2882 — 9 days ago

During the 2025 holiday season, AI chatbots and browsers drove double the e-commerce traffic compared to 2024, according to Salesforce. AI was credited with influencing 20% of all retail sales, generating $262 billion in revenue. That happened without most brands noticing, let alone preparing for it.What is coming next is a different category of problem entirely.Agentic commerce is when an AI does not just recommend a product but completes the purchase. You tell it "keep my household essentials under $300 a month" and it monitors inventory, compares prices across merchants, applies discount codes, and places the order. You find out when the package arrives. Google's official definition from their January 2026 announcement states it plainly: "Agentic commerce is where AI doesn't just suggest products, but actually helps complete the task of checking out."Shopify, PayPal, Google, and Stripe are all building infrastructure for agents to browse catalogs and execute purchases directly. This is not a concept. It is already being deployed.The commercial consequence that nobody in retail wants to say out loud: agents optimize for utility, value, and fit. Not brand loyalty. When an AI is choosing between two similar products at similar prices, it is not going to pick the one with the better Instagram presence or the founder story that resonated in a 2022 campaign. It is going to pick the one whose product data is cleaner, whose delivery promises are more consistent, and whose API is easier to transact with. Switching costs approach zero when the agent handles everything.A Mirakl survey of retail technology partners found that the most commonly cited risk of agentic commerce is disintermediation: brands losing direct traffic and customer relationships as discovery shifts entirely to AI platforms. When an agent buys on your behalf, you never visit the brand's website. You never see the cross-sell. You never join the loyalty program. The entire behavioral data pipeline that modern e-commerce is built on stops working because the human stopped showing up.Weekly retail site traffic is already down 21% between 2024 and 2025 according to Quantum Metric data. Conversion rates dropped 27% in the same period. Shoppers are making fewer, larger purchases rather than browsing and impulse buying. The behavior that built the current e-commerce model, the casual scroll, the comparison tab, the "maybe I'll add that too," is quietly disappearing before agents even reach mainstream adoption.The honest counterargument: only 46% of consumers currently trust AI recommendations enough to act on them without checking elsewhere, according to eMarketer. Julie Towns, VP of product marketing at Pinterest, said in January 2026 that fully autonomous end-to-end shopping will remain underdeveloped through this year, especially for high-stakes purchases. Trust is the ceiling that technology keeps running into. People will delegate toothpaste before they delegate a mattress.Forrester predicts that by 2026, one in five sellers will need to respond to AI-powered buyer agents with dynamically delivered counteroffers via their own seller-controlled agents. The negotiation layer of commerce, which has been invisible to consumers for decades, is about to become a machine-to-machine protocol.Product content optimized for Google does not work for AI agents. An agent does not read a hero image or a brand story. It reads structured data, pricing consistency, delivery window accuracy, and return eligibility in a format it can compare across hundreds of merchants in seconds. Merkle's commerce team called this out directly: there is a fundamental mismatch between how most brands have built their digital presence and what agents actually need to make a decision.If you run an e-commerce brand or work in retail, has the shift in traffic patterns changed how you are thinking about where to put resources? And if you are a consumer who has already let an AI make a purchase on your behalf, what was the thing that made you comfortable enough to hand that over?

reddit.com
u/Successful_List2882 — 10 days ago

On April 8, 2026, Sam Altman posted that Codex had just crossed 3 million weekly users. That is a 5x increase in three months. The next day, OpenAI launched a $100 per month plan. The timing was not accidental. The new tier sits directly between ChatGPT Plus at $20 and the existing Pro plan at $200. It offers 5x more Codex usage than Plus. Access to the same model suite as the $200 plan. And it is priced identically to Anthropic's Claude Max 5x tier, which also costs $100 per month and also targets developers hitting coding limits. OpenAI did not try to hide the competitive framing. A spokesperson told TechCrunch directly: "Compared with Claude Code, Codex delivers more coding capacity per dollar across paid tiers." A company that rarely mentions competitors by name mentioned a competitor by name in the launch statement. The reason this fight matters is the number behind it. Claude Code's run-rate revenue crossed $2.5 billion in February 2026, more than doubling since January. That is a single product from Anthropic growing faster than most SaaS companies exist. OpenAI watched that happen and responded with a matching price point and an explicit claim of better value per dollar. The $100 tier is not a product decision. It is a shot across the bow.Here is what actually needs scrutiny before anyone pulls out a credit card. OpenAI has not published what "5x more" means in concrete task counts. The credit system varies by task complexity, codebase size, and session length. A Plus user hitting limits on simple daily tasks and a Plus user hitting limits on multi-file refactors across large codebases are having completely different problems. The same $100 plan may solve one and barely move the needle on the other. The launch promotion runs through May 31. Through that date, $100 subscribers get 10x Codex usage instead of the standard 5x. Anyone who signs up, spends a month not hitting limits, and concludes the tier is worth it should check again in June. The numbers that feel comfortable during a promotional period are not the numbers that define the actual product. The uncomfortable truth for both companies: nobody has published a clean head-to-head benchmark of Codex versus Claude Code on real production tasks at matched cost. The "more capacity per dollar" claim from OpenAI and the growth numbers from Anthropic are both marketing. The developers doing serious agentic coding work are the only people with the actual data, and most of them are not publishing it. OpenAI now has six pricing tiers: free with ads, $8 Go with ads, $20 Plus, $100 Pro, $200 Pro, and enterprise. That is a complicated menu for a product that launched four years ago with one plan. For developers who have actually used both Codex and Claude Code on real projects at similar usage levels, which one runs out of capacity first? And for anyone who just signed up for the $100 tier on launch day, what does the usage actually look like week two?

u/Successful_List2882 — 10 days ago

A Cal State professor submitted their own hand-written work through Turnitin to test the system. It came back 98% AI probability. They had written every word themselves. That is not an edge case. That is the system working exactly as designed, which is the problem. A 2026 study evaluating commercial AI detectors on 192 authentic student texts found false positive rates ranging from 43% to 83%. Meaning in some cases, nearly every real essay was flagged as fake. For non-native English speakers it is worse. A landmark study published in Computers and Education: AI found that detectors incorrectly labeled 61.3% of essays written by non-native English speakers as AI-generated. Stanford HAI tested seven detectors on TOEFL essays and found that 19% were unanimously flagged as AI by all seven tools at once.The students actually using AI figured this out faster than the institutions trying to catch them.Prompt engineering communities on Reddit now have detailed guides on how to make AI output sound human. Not by removing the AI, but by prompting it differently. Write as a first draft. Vary sentence length deliberately. Let a point develop unevenly before it lands. Use conjunctions at the start of sentences. These adjustments drop AI detection scores by 10 to 30 percentage points on most tools, according to testing published by NaturalRewrite in March 2026. The students being punished are mostly the ones who did not know these tricks existed.Turnitin now tracks over 150 AI humanizer tools. In October 2025 alone, 43 of those tools recorded 33.9 million website visits in a single month, according to an NBC News investigation. Students are not using these tools because they are lazy. Many are using them because they already wrote the essay themselves and got flagged, and are now trying to make their own writing pass a broken test.Here is the part that should make every university administrator uncomfortable. Grammarly reported that students created over 5 million Authorship reports last year, mostly never submitted, used only to self-check before turning in their own work. Students are now editing how they naturally write to avoid triggering a detector. One student told NBC News directly: "I'm writing just so that I don't flag those AI detectors."The system designed to protect academic integrity is teaching students that clear, structured, well-argued writing is dangerous. Write messily. Write unevenly. Write like you did not quite finish the thought. That is what passes now.The counterargument worth taking seriously: AI use in academic writing is genuinely widespread and genuinely difficult to address. A University of California survey in 2024 found that 43% of students admitted to using AI on assignments where it was not permitted. Institutions are not wrong to look for a solution. The problem is that the tool being used to enforce the policy is producing false accusations at a rate that would be considered unacceptable in any other context where someone's academic record is on the line.A University of Michigan student filed suit in 2026 after being accused based on a detection score. Courts are beginning to establish that an AI detection score alone does not constitute evidence of academic dishonesty.If you have been flagged for writing you did entirely yourself, what happened next? And if you are an educator still using these tools, what would actually change your mind about relying on them?

reddit.com
u/Successful_List2882 — 11 days ago

A fully automated pipeline. Script generation, AI avatar, voiceover, edit, publish. No human on camera. No creator outreach. No scheduling calls. Just a product, a workflow, and an output queue.
The numbers people are reporting from this approach are real. Platforms like Arcads are showing some advertisers generating over $130,000 monthly from AI-generated videos on TikTok. Creatify users report up to 130% improvement in click-through rate versus traditional formats. The content creation cost drops from hundreds of dollars per video to somewhere between $5 and $10 per minute of output. For brands that need to test 20 or 30 creative variations to find three that convert, this changes the entire economics of paid social.
The reason it works is not complicated. UGC-style content outperforms polished brand ads because it looks like something a real person made, not a marketing department. Sixty percent of consumers say UGC represents the most authentic form of marketing content. When you automate the production of content that looks like it came from a real person, you are essentially scaling the thing that works by removing the part that costs money.
That sentence is also where the problem starts.
The FTC finalized a rule in August 2024, effective October 2024, that explicitly prohibits AI-generated consumer reviews and testimonials that misrepresent the identity or experience of the reviewer. In December 2025, the FTC sent warning letters to ten companies for violating this rule. The civil penalty is up to $53,088 per violation. Not per campaign. Per violation. New York also signed a synthetic performer disclosure law requiring advertisers to clearly label when AI-generated people appear in commercials.
Most people automating UGC reaction videos right now are not thinking about any of this. They are thinking about ROAS.
The distinction the FTC draws is specific. An AI avatar demonstrating a product is different from an AI avatar claiming to be a real customer with a real experience. A labeled AI ad is different from an unlabeled one. The content is not inherently illegal. The implication that a real human being had a real reaction is what creates legal risk.
Research also shows that transparent disclosure of AI involvement actually increases consumer trust by 24% compared to undisclosed AI content. The audience is less bothered by AI than brands think. The deception is the problem, not the tool.
The people getting this right are running a hybrid model. AI for volume testing and rapid variation. Human creators for scaling the winners once the data is in. Clear disclosure tags on synthetic content. Documented workflows showing what is AI and what is not.
The people getting this wrong are treating a production shortcut like a strategy.
If you have actually run automated UGC at scale, what happened when you got to the compliance question? And if you are a brand or agency watching this space, where do you draw the line between AI-assisted content and content that starts to misrepresent who made it?

u/Successful_List2882 — 11 days ago