u/Pale-Entertainer-386

▲ 1 r/MetalsOnReddit+1 crossposts

**hot take: anthropic & openai might not make it 🤷‍♀️**

ok so i've been thinking about this way too much, but low-key think both companies are kinda cooked long-term. not the tech (that's fire), but like... how do they actually *make money* without burning VC cash forever?

tbh the revenue numbers look crazy, but most of that is just enterprises... trying stuff out? like, i saw somewhere that 95% of AI pilots never even hit production. that's not a stable revenue base, that's a bunch of "let's see if this works" experiments.

and here's the thing nobody talks about: selling more tokens = burning more cash. the real winners are aws/google who own the GPUs. it's like... you're the one digging for gold, but the shovel store is raking it in.

if i were running one of these? i'd slow down the hype train, focus on cash flow, and try to build something enterprises will actually pay for (like palantir did). but yeah, their valuations are already insane so... good luck with that 🍳

...it's like... you're the one digging for gold, but the shovel store is raking it in.

if i were running one of these? i'd slow down the hype train...

(btw idk if this is relevant but i feel like amazon back in the day was just... weirdly paranoid about cash? could be totally wrong but feels like the opposite vibe here lol)

anyway that's my ramble. change my mind 👇

(sources if you care: that mit report on ai pilot failures was wild)

reddit.com
u/Pale-Entertainer-386 — 3 days ago

OpenAI just entered management consulting. That's not expansion—it's a confession.

**TL;DR**: OpenAI and Anthropic aren't suddenly "good at enterprise transformation." They're partnering with private equity and consulting firms because LLMs still hallucinate by design, enterprises don't trust black-box outputs, and valuations demand revenue *now*. This isn't a go-to-market strategy—it's valuation maintenance disguised as innovation. The .com bubble didn't kill e-commerce; it killed companies that tried to compress social adoption with capital. Watch closely: this playbook looks familiar.

---

## 🎭 The News Hook: Why Are AI Startups Suddenly "Consulting Firms"?

Last month, two moves flew under the radar:
- **Anthropic** × Blackstone / Hellman & Friedman / Goldman Sachs → new *enterprise AI services company*
- **OpenAI** × TPG / Bain Capital → *"The Deployment Company"*

On the surface: smart distribution.
Reality check: **They're outsourcing credibility**.

If LLMs were truly ready for enterprise workflows, you wouldn't need McKinsey to convince a CIO to adopt them. You'd just show the ROI. But when your core technology:
- Hallucinates by statistical design [[OpenAI Blog](https://openai.com/index/why-language-models-hallucinate/)**\]**
- Struggles with "lost in the middle" in long contexts
- Can't guarantee safe, auditable multi-agent coordination

...you don't sell *capability*. You sell *urgency*. And who's best at manufacturing urgency for enterprise boards? Consultants.

> 🧭 **Burry-style insight**: When a tech company starts acting like a consulting firm, ask: *Is this scaling a product—or scaling a story?*

---

## 🔍 Follow the Incentives: Valuation Pressure ≠ Product Readiness

Let's be clear: OpenAI and Anthropic aren't "expanding into consulting" because they discovered a new competitive advantage. They're doing it because:

Pressure Reality
**Valuations** OpenAI's implied valuation ($460B–$852B) demands enterprise-scale revenue *yesterday*
**Adoption friction** Enterprises won't replace human workflows with models that "might hallucinate"
**Distribution gap** Direct sales cycles for complex AI are 12–18 months. Consultants compress that to 3–6.
**Narrative defense** "AGI is coming" creates FOMO. "Helpful copilot" does not.

This isn't unique to AI. In the late 1990s, e-commerce startups hired "digital transformation" consultants to sell the *idea* of online shopping—before payments, logistics, and trust were ready. The result? Consultants got paid. Startups burned cash. The ecosystem matured... and Amazon, not Pets.com, captured the value.

> 📉 **Pattern recognition**: When distribution depends on selling anxiety rather than demonstrated ROI, the business model is fragile.

---

## ⚙️ The Technical Gap They're Not Solving (But Papers Already Have)

> The core issue isn't "LLMs are bad." It's that *deployment-ready safeguards exist in research papers but not in products*. Rushing enterprise adoption before they're integrated is how you get scale-level hallucination disasters.

### 1️⃣ Lost in the Middle → Structured Indexing Works
- **Problem**: Long contexts dilute attention; critical info in the middle gets ignored.
- **Solution**: Pre-structure data with **dual-layer summaries + indexes** to guide retrieval, not force search-in-noise.
- **Paper**: [Self-Describing Structured Data with Dual-Layer Guidance](https://www.researchgate.net/publication/403842614\_Self-Describing\_Structured\_Data\_with\_Dual-Layer\_Guidance\_A\_Lightweight\_Alternative\_to\_RAG\_for\_Precision\_Retrieval\_in\_Large-Scale\_LLM\_Knowledge\_Navigation)
- **Reality check**: Not productized. Enterprises deploying vanilla RAG today are accumulating *hallucination debt*.

### 2️⃣ Prompt Security → Semantic Monitoring Is Broken
- **Risk**: AI can hide intent *inside* seemingly benign output (steganographic collusion). Semantic monitoring alone won't catch it.
- **Papers**:
- [Steganographic Intent Detection](https://openreview.net/forum?id=Ylh8617Qyd)
- [Instruction Following ≠ Reward Function](https://arxiv.org/pdf/2602.20021)
- [Circuit-Breaking for MARL Safety](https://www.researchgate.net/publication/402611883\_Beyond\_Reward\_Suppression\_Reshaping\_Steganographic\_Communication\_Protocols\_in\_MARL\_via\_Dynamic\_Representational\_Circuit\_Breaking)
- **Practical fix**: Compress agent communication to simple signals (red/green) + statistical anomaly detection. Not sexy. Not deployed.

### 3️⃣ Real AGI Needs Constraints, Not Just Scale
- **Framework**: Predefine business-logic "elements," let LLMs *compose within verified boundaries* rather than invent freely.
- **Human-AI Handoff**: AI handles pattern matching & retrieval; humans handle boundary judgment & value tradeoffs.
- **Papers**:
- [Constraint-Driven Human-AI Collaboration](https://www.researchgate.net/publication/403842380\_A\_Constraint-Driven\_Framework\_for\_Process-Traceable\_HumanAI\_Collaboration)
- [Auditable Behavioral Inference Library](https://www.researchgate.net/publication/403951418\_From\_Explicit\_Elements\_to\_Implicit\_Intent\_A\_Predened\_Library\_for\_Auditable\_Behavioral\_Inference)
- **Key insight**: Experts don't know all answers—they know *when their reasoning fails*. Current LLM deployments simulate the opposite.

---

## 📉 The .com Parallel: It's Not About the Tech, It's About Timing

2000 E-commerce 2026 Enterprise AI
Pets.com spent millions teaching people to buy pet food online OpenAI spends millions teaching enterprises to trust hallucinating models
Webvan built warehouses before last-mile logistics existed Anthropic builds "contextual retrieval" before enterprise data is structured
**Failure mode**: Burned cash educating a market that wasn't ready **Failure mode**: Burning cash deploying models that aren't ready
**Winner**: Amazon, which picked "books"—lowest ecosystem dependency **Potential winner**: Whoever picks the "books" of AI (code assist? knowledge retrieval?) and waits for the ecosystem to catch up

> 🧭 **The Burry Thesis**: Intrinsic value ≠ narrative valuation.
> Amazon survived 2000 not by being "more visionary," but by picking a product that required minimal ecosystem support.
> Who's picking the "books" of AI today? (Hint: It's not the $460B-valued startups rushing into consulting partnerships.)

---

## ⚖️ Final Question for the Room

If you were a value investor looking at OpenAI/Anthropic today:
- Do you see a company with durable competitive advantages?
- Or a company with durable *capital pressure* to perform before readiness—and now using consulting partnerships to bridge the gap?

> *"The market can stay irrational longer than you can stay solvent."*
> But when a tech company starts selling urgency through consultants instead of ROI through capability? That's when the clock starts ticking.

*Not financial advice. Just connecting dots the market is ignoring.*

---

**Sources for deeper digging**:
- Hallucinations: [OpenAI: Why Language Models Hallucinate](https://openai.com/index/why-language-models-hallucinate/)
- Technical papers: All ResearchGate/OpenReview/arXiv links embedded above
- Financing context (background): [Bloomberg](https://www.bloomberg.com/news/articles/2026-05-08/softbank-cuts-target-for-openai-margin-loan-by-40-to-6-billion) | [AInvest](https://www.ainvest.com/news/softbank-6b-openai-loan-cut-signals-collateral-crack-64-6b-leveraged-bet-2605/)

*What's your take? Are AI-consulting JVs a smart distribution play—or a signal that the tech isn't ready for prime time? Happy to discuss with data.*

u/Pale-Entertainer-386 — 4 days ago

My last post here was perhaps too abstract, and judging by the feedback, I didn't quite bridge the gap between "theory" and "practice." After some reflection, I want to try again—this time focusing strictly on the architectural shift required for true Agent personalization.

https://www.reddit.com/r/PromptEngineering/s/67OZSq8fPF

The rapid rise and fall of tools like Manus and OpenClaw prove a brutal reality: the market is starving for personalized AI, but our current tools are fundamentally static. We are trying to solve a deep, fluid human need with frozen code and hard-coded prompts.
Here is the clinical breakdown of why current approaches fail, and the technical path forward.
I. The Core Mismatch: Engineers vs. Everyone Else
Current AI coding tools (Hermes, etc.) target developers. It's a capped market. Data from mass-market platforms proves that the real, untapped demand comes from non-technical users. Their needs are fluid, but they are treated as generic noise by current Agent architectures.
II. The Failure of "Pseudo-Personalization"
Most agents today rely on pre-written, "frozen" system prompts. This is broadcast, not service. True personalization cannot be achieved through static code because human context is non-linear. The only way out is the dynamic generation of both prompts and execution logic.
III. The Technical Path: Dynamic Compilation
Instead of finding a prompt, we need to compile a system. I have prototyped a pipeline that does exactly this:
Input: Raw natural language requirement.
Phase 1 (Compiler): Intent is compiled into a structured Intermediate Representation (IR) via a workflow_manifest.json.
Phase 2 (Optimization): The IR logic is validated.
Phase 3 (Generation): Specific Python modules and dedicated System Prompts are auto-generated based on that IR.
Execution: The LLM is called within this bespoke, temporary environment.
The key distinction: The prompt is no longer a starting template; it is a compiled artifact.

Discussion
True personalization happens the moment a requirement is compiled into a bespoke execution structure, not when a user ticks boxes in a settings menu.
My question to you: Do you think the industry will accept the reliability risks of dynamic code generation (latency, potential crashes) in exchange for breaking through the scalability ceiling of current agents? Or is the complexity of dynamic compilation too high to ever be production-stable?

Note: I have attached the referenced papers in the comments below for those who want to dive deeper into the academic side of this.

https://www.researchgate.net/publication/403842380_A_Constraint-Driven_Framework_for_Process-Traceable_HumanAI_Collaboration

https://www.researchgate.net/publication/403842380_A_Constraint-Driven_Framework_for_Process-Traceable_HumanAI_Collaboration

reddit.com
u/Pale-Entertainer-386 — 7 days ago

A very different way of building: instead of asking AI to write code directly,
I first let it generate prompts, and the code comes after.

I've been experimenting with a development workflow that feels genuinely different
from normal AI coding.

The usual pattern is: you ask the model to write code, then you keep iterating on the code.

But what I've been testing is closer to this:
**Don't ask for code first. Ask for prompts first.**

More specifically: ask the system to turn a vague goal into a more executable
prompt structure, and only then let that structure produce the code.

That sounds like a small difference, but in practice it changes a lot.

Instead of treating the prompt as a one-shot instruction, I'm treating it more like
an **intermediate translation layer** between intent and implementation.

---

### The Flow:

**Phase 1 — Intent Capture**
Start with a simple, vague, abstract requirement. Not a real spec. Just the intent.
*(Example: "I want an app that helps me track daily habits.")*

**Phase 2 — Prompt Structuring**
Have the model expand that intent into an operational prompt layer:
• Subtask decomposition (auth, data model, UI components)
• Constraints & boundaries (platform, tech stack, privacy rules)
• Success criteria (what "done" actually means)
• Tool-use logic (when to call APIs, when to generate local code)
• Checkpoints for human review

**Phase 3 — Tool-Level Prompt Generation**
Use the structured prompt to drive targeted, executable instructions:
• "Generate a React + TypeScript login component with form validation"
• "Design a localStorage schema for habit entries with timestamps"
• "Write unit tests covering edge cases for streak calculation"

**Phase 4 — Implementation + Human Steering**
• AI produces actual code based on the tool-level prompts
• Human focuses on direction, constraint adjustment, and correction at key branch points
• Iteration happens at the *prompt-structure* level, not just the code level

---

> 📌 **Note on scale & execution**
> The entire prompt system is organized into **3 core Phases**, totaling ~**50,000 tokens**.
> It's a highly complex, logically rigorous instruction architecture.
> As shown at the start of the video: all context data and prompt files are bundled
> into a single archive and fed to Claude to kick off the task.
> **In theory, interactive checkpoints could be inserted—but for this DEMO, the model
> handles the entire flow autonomously, with zero human intervention.**

---

What surprised me is that this often works better than asking for code directly —
**not because the AI is smarter, but because the interface between human intent
and machine execution is clearer**.

My current take: the important thing isn't "writing a cleverer prompt."
It's building a **prompt system that can keep translating high-level intent into
lower-level executable steps**, with stability coming from constraints, not wording tricks.

### A few things that matter more than people think:

🔹 **Don't force the model to jump straight to final code.**
Having it generate the next executable layer first is often much more stable.
Errors get caught earlier, in the planning stage.

🔹 **Constraints matter more than wording tricks.**
Boundaries, failure conditions, tool selection, and explicit success criteria
seem more important than fancy phrasing. A well-constrained vague prompt often
outperforms a perfectly-worded unconstrained one.

🔹 **The real leverage is in the middle layer.**
Not the original request, not the final code, but the generated prompt structure
in between. That's where the "professional judgment" of software engineering gets encoded.

🔹 **Good results come from prompt evolution, not prompt perfection.**
It's less about one perfect instruction, more about a chain that keeps refining
itself into executable work — with human feedback guiding the refinement.

---

So to me, this feels less like "AI-assisted coding" in the old sense,
and more like a **new coding paradigm**.

**The human role shifts:**
• Less direct authoring of syntax
• More stewardship of intent, constraints, and quality gates
• Less "how do I write this loop?"
• More "does this output actually solve the problem I care about?"

That's also why "everyone can code" suddenly feels less like a slogan and more
like something that might actually become true — **not because code became
unnecessary, but because the interface to creating software is changing**.

### Realistic caveats:
✅ This doesn't mean "no thinking required." You still need to express intent
clearly and recognize when output misses the mark.
✅ Basic logical reasoning and constraint-design intuition still matter —
garbage in, garbage out.
✅ The toolchain needs to be stable enough that AI-generated code can actually
integrate and run.

But if those conditions are met?
Then yes: someone who can articulate a goal, define success, and spot logical
gaps can absolutely drive the creation of working software —
**without memorizing syntax or debugging webpack configs**.

---

I recorded the whole process on video. The video isn't the main point —
it's just proof that this workflow is viable:
a very simple, abstract, blurry requirement can turn into a working software
system with surprisingly little manual intervention, **because the heavy lifting
of "requirements → technical plan → executable steps" is handled by the
~50k-token prompt-translation layer**.

If people think this is worth digging into, I can put more details on GitHub later:
• Example prompt-structure templates
• Constraint-checklist patterns
• A minimal starter workflow for experimentation

🎥 Demo: https://www.youtube.com/watch?v=Q61NQtQYHHI

u/Pale-Entertainer-386 — 11 days ago