u/bralca_

Anthropic is going to charge 50X more for Claude Code on June 15th. You need to make your workflow provider agnostic. Here is Why (And How).

AI coding is built on two assumptions that will not hold forever:

Frontier intelligence feels cheap through flat subscriptions.
The user is assumed to be an engineer babysitting a chat agent.

Both are changing.

When subscription arbitrage narrows, AI coding must allocate intelligence efficiently. At the same time, companies will reorganize around smaller AI-native teams and builders who own more of the feature lifecycle.

Chat-based tools are not the right architecture for that world.

The next layer is an Intelligence Factory: a system where the feature becomes the durable artifact, planning manufactures context, tasks are routed across models and providers, and verification makes cheaper intelligence usable without asking the user to coordinate every step

The Elephant in the Room: Subscription Arbitrage

I analyzed my own usage over the last nine months. Priced as direct API consumption, it would have cost more than $500,000. Instead, I paid a few hundred dollars per month.

To be clear, this is not a claim about what the providers paid to serve my usage. It is the retail API-equivalent price of the same kind of heavy frontier-model consumption, estimated from observed usage and public API pricing. The point is not precision to the dollar. The point is the gap.

That gap changes behavior.

When frontier intelligence feels almost free at the margin, the default strategy becomes brute force: use the strongest model, run it longer, retry more, paste more context, and hope the agent eventually gets there.

That works while the economics are subsidized by flat subscriptions.

It becomes fragile when the system has to face the real marginal cost of intelligence.

The Arbitrage Will Narrow

The arbitrage may not disappear overnight. Inference costs may continue falling. Open models may keep improving. Providers may preserve flat plans for some user segments.

But the unlimited-feeling version of frontier intelligence will narrow.

Maybe through stricter limits. Maybe through higher prices. Maybe through usage tiers.

The mechanism matters less than the direction.

AI coding will eventually have to care much more about where intelligence is spent.

Today, most AI coding discussion is about capability.

Which model writes better code? Which editor has the stronger agent? Which CLI can run longer? Which assistant feels smartest?

The post-arbitrage question is different: How do we allocate intelligence efficiently?

Models are starting to look less like the product and more like the energy source. Providers sell access to intelligence. The valuable layer is the system that turns that intelligence into shipped work efficiently.

In that world, the expensive model becomes the escalation path, not the default runtime.

Cheaper models handle bounded work where the task is clear and verification can catch mistakes. Premium models handle ambiguity, architecture, deep debugging, integration risk, and final acceptance.

The largest frontier spend should sit near the verification boundary, where the system checks whether the feature meets its acceptance criteria, identifies uncertainty, and decides whether escalation is needed.

Current Tools Have the Right Primitives but State is Too Scattered

Current AI coding tools are improving fast.

They already expose many of the right primitives: repository access, file edits, shell commands, planning modes, memory, subagents, worktrees, hooks, cloud tasks, checkpoints, and resumable sessions.

Those primitives matter. They are the execution layer.

But execution is not the core problem anymore. The core problem is state.

Chat Is a Good Interface, but a Bad State Container

In most chat-based products, the conversation, thread, or agent run still acts as the source of truth.

The feature state gets scattered across the initial prompt, the model’s plan, later corrections, tool output, summaries, memory files, branches, commits, test logs, checkpoints, and the user’s own memory.

Those pieces exist, but they do not form one durable artifact. They do not reliably talk to each other.

That is why the human quietly becomes the coordinator.

The user restates intent, pastes logs, corrects drift, reminds the model what changed, restarts failed runs, and decides whether the final result still matches the original request.

That works when AI is an assistant. It breaks down when AI becomes part of the delivery system.

The problem is not chat as an interface.

Chat is still useful for intent, clarification, review, and approval.

The problem is chat as the state container.

Chat Discovers Too Much While Spending

The perfect example to illustrate this point is the recent /goal release by Codex.

A user can give the agent an objective, and the runtime can continue working toward that goal across turns, with controls to create, pause, resume, and clear the goal.

That is a real improvement. It moves the tool closer to long-running autonomous work.

But it also exposes the next bottleneck.

A persistent goal is still not the same thing as a durable feature artifact.

If the path is unclear, the agent still has to discover the plan while it is already running. It has to decide what matters, inspect the repo, infer dependencies, choose the next step, test, recover, and judge whether the goal is satisfied from inside the same expensive loop.

That loop needs frontier intelligence end to end because too much of the work remains ambiguous during execution.

The system keeps spending while it is figuring out the shape of the work.

How the Intelligence Factory solves the problem

The Intelligence Factory would handle the same problem differently.

It would turn the goal into a feature seed, inspect the repository before execution, extract acceptance criteria, build a task graph, classify task complexity, decide routing policy, generate focused task briefings, and only then start executing.

The long-running loop still exists, but it is no longer a dumb loop asking one frontier agent to keep pushing until the goal looks done.

It becomes an orchestrated production line: goal → feature seed → repo analysis → task graph → routed execution → verification → escalation if needed

The Intelligence Factory helps the system know what should happen next, who should do it, what context they need, how expensive the step should be, and how completion should be verified.

This is the lossy projection problem.

Using chat or a single agent loop as the durable container for software delivery is like trying to represent a cube on a flat plane: you can draw the faces, label the edges, and add shadows, but the object is still compressed into the wrong dimension.

A smarter model inside the loop still inherits the constraints of the loop.

Why the Durable Artifact Is the Feature

By feature, I mean a bounded unit of software delivery: large enough to represent real user or business value, but small enough to plan, route, verify, recover, review, and merge.

A feature can be a new capability, a bug batch, a refactor, a migration, a performance pass, or a full-stack change.

The category matters less than the lifecycle. A feature has intent, scope, acceptance criteria, implementation work, verification, and a handoff or merge boundary.

That makes it the right durable artifact for AI coding.

Why not the Project?

The project is too broad. A project contains old decisions, stale assumptions, unrelated work, conflicting priorities, and background knowledge that should not enter every task. Project knowledge should inform the work, but it should not become the active work artifact.

The feature sits at the right level.

It is bounded enough to control context and cost. It is large enough to represent shipped value.

What the feature has to preserve

Treating the feature as the durable artifact does not mean creating a bigger spec.

It means preserving the state required to keep delivery coherent across models, providers, sessions, failures, and reviews.

A feature has to preserve four kinds of state.

Intent State

Intent state records what the user wants, what is out of scope, which assumptions are accepted, and which questions still matter. Without this, every model call slowly reinterprets the original request.

Execution State

Execution state records the technical plan, task graph, dependencies, owned surfaces, and current progress. Without this, autonomy becomes a long-running loop with no durable understanding of what remains.

Economic State

Economic state records task complexity, failure cost, routing policy, preferred model or provider, fallback route, and escalation rule. Without this, the system cannot allocate intelligence before spending it.

Trust State

Trust state records verification targets, test results, unresolved gaps, recovery points, and review status. Without this, cheaper-model routing becomes risky and long-running work becomes hard to trust.

Verification does not make cheap intelligence magically safe. It makes cheap intelligence usable by bounding the work, checking known contracts, surfacing uncertainty, and escalating when unresolved risk remains.

Planning Is the Context Factory

The feature starts as a seed

The user should not need to write a perfect PRD.

A normal request should be enough.

The system’s first job is to turn that request into a feature seed: a small, structured starting point that makes the work actionable without pretending everything is already known.

A good feature seed answers three questions.

What is being changed? The system extracts the goal, expected behavior, visible constraints, and non-goals from the request.

What needs to be clarified? The system inspects the repository before asking questions. It should only interrupt the user for decisions that change scope, architecture, routing, or verification.

What would make this complete? The system turns the request into early acceptance criteria so later work can be verified against something stable.

This is the first moment where the system stops being a chat assistant and starts becoming a delivery system.

Planning manufactures operating context

Planning is not overhead. Planning manufactures the context that makes autonomy and routing possible.

A plan inside a .md file is fragile because it doesn't produce structured machine-readable knowledge. A plan promoted into feature state becomes reusable operating context.

The planning step has three jobs.

First, it aligns intent. It separates facts, assumptions, open questions, and non-goals. It asks only the questions that change implementation.

Second, it structures execution. It maps requirements to a technical approach, breaks the work into tasks, identifies dependencies, and defines which files or surfaces each task is likely to touch.

Third, it creates the control points for cost and trust. It classifies task complexity, chooses routing policy, defines verification targets, and records where recovery should resume if the workflow fails.

The most important output is not the plan document.

The output is clean structured context that allows downstream activities to run as efficiently as possible.

Each model call should receive a focused briefing: the task goal, relevant requirements, accepted decisions, constraints, likely files, integration contracts, and verification steps.

That is what reduces context rot.

That is what makes providers interchangeable.

That is what makes cheap models usable.

That is what lets the system run longer without the user babysitting every step.

The plan is the context factory. Without it, every model call has to rediscover the work.

----

Ps*: I built a tool that embodies all the principles above (and much more that I left out to not write a poem). Happy to share more with anybody interested*

----

reddit.com

u/bralca_ — 16 hours ago

▲ 1 r/SaaSSolopreneurs

Anthropic is going to charge 50X more for Claude Code on June 15th. You need to make your workflow provider agnostic. Here is Why (And How).

AI coding is built on two assumptions that will not hold forever:

Frontier intelligence feels cheap through flat subscriptions.
The user is assumed to be an engineer babysitting a chat agent.

Both are changing.

Chat-based tools are not the right architecture for that world.

The Elephant in the Room: Subscription Arbitrage

I analyzed my own usage over the last nine months. Priced as direct API consumption, it would have cost more than $500,000. Instead, I paid a few hundred dollars per month.

That gap changes behavior.

That works while the economics are subsidized by flat subscriptions.

It becomes fragile when the system has to face the real marginal cost of intelligence.

The Arbitrage Will Narrow

The arbitrage may not disappear overnight. Inference costs may continue falling. Open models may keep improving. Providers may preserve flat plans for some user segments.

But the unlimited-feeling version of frontier intelligence will narrow.

Maybe through stricter limits. Maybe through higher prices. Maybe through usage tiers.

The mechanism matters less than the direction.

AI coding will eventually have to care much more about where intelligence is spent.

Today, most AI coding discussion is about capability.

Which model writes better code? Which editor has the stronger agent? Which CLI can run longer? Which assistant feels smartest?

The post-arbitrage question is different: How do we allocate intelligence efficiently?

In that world, the expensive model becomes the escalation path, not the default runtime.

Cheaper models handle bounded work where the task is clear and verification can catch mistakes. Premium models handle ambiguity, architecture, deep debugging, integration risk, and final acceptance.

Current Tools Have the Right Primitives but State is Too Scattered

Current AI coding tools are improving fast.

They already expose many of the right primitives: repository access, file edits, shell commands, planning modes, memory, subagents, worktrees, hooks, cloud tasks, checkpoints, and resumable sessions.

Those primitives matter. They are the execution layer.

But execution is not the core problem anymore. The core problem is state.

Chat Is a Good Interface, but a Bad State Container

In most chat-based products, the conversation, thread, or agent run still acts as the source of truth.

Those pieces exist, but they do not form one durable artifact. They do not reliably talk to each other.

That is why the human quietly becomes the coordinator.

The user restates intent, pastes logs, corrects drift, reminds the model what changed, restarts failed runs, and decides whether the final result still matches the original request.

That works when AI is an assistant. It breaks down when AI becomes part of the delivery system.

The problem is not chat as an interface.

Chat is still useful for intent, clarification, review, and approval.

The problem is chat as the state container.

Chat Discovers Too Much While Spending

The perfect example to illustrate this point is the recent /goal release by Codex.

A user can give the agent an objective, and the runtime can continue working toward that goal across turns, with controls to create, pause, resume, and clear the goal.

That is a real improvement. It moves the tool closer to long-running autonomous work.

But it also exposes the next bottleneck.

A persistent goal is still not the same thing as a durable feature artifact.

That loop needs frontier intelligence end to end because too much of the work remains ambiguous during execution.

The system keeps spending while it is figuring out the shape of the work.

How the Intelligence Factory solves the problem

The Intelligence Factory would handle the same problem differently.

The long-running loop still exists, but it is no longer a dumb loop asking one frontier agent to keep pushing until the goal looks done.

It becomes an orchestrated production line: goal → feature seed → repo analysis → task graph → routed execution → verification → escalation if needed

The Intelligence Factory helps the system know what should happen next, who should do it, what context they need, how expensive the step should be, and how completion should be verified.

This is the lossy projection problem.

A smarter model inside the loop still inherits the constraints of the loop.

Why the Durable Artifact Is the Feature

By feature, I mean a bounded unit of software delivery: large enough to represent real user or business value, but small enough to plan, route, verify, recover, review, and merge.

A feature can be a new capability, a bug batch, a refactor, a migration, a performance pass, or a full-stack change.

The category matters less than the lifecycle. A feature has intent, scope, acceptance criteria, implementation work, verification, and a handoff or merge boundary.

That makes it the right durable artifact for AI coding.

Why not the Project?

The feature sits at the right level.

It is bounded enough to control context and cost. It is large enough to represent shipped value.

What the feature has to preserve

Treating the feature as the durable artifact does not mean creating a bigger spec.

It means preserving the state required to keep delivery coherent across models, providers, sessions, failures, and reviews.

A feature has to preserve four kinds of state.

Intent State

Execution State

Economic State

Trust State

Planning Is the Context Factory

The feature starts as a seed

The user should not need to write a perfect PRD.

A normal request should be enough.

The system’s first job is to turn that request into a feature seed: a small, structured starting point that makes the work actionable without pretending everything is already known.

A good feature seed answers three questions.

What is being changed? The system extracts the goal, expected behavior, visible constraints, and non-goals from the request.

What needs to be clarified? The system inspects the repository before asking questions. It should only interrupt the user for decisions that change scope, architecture, routing, or verification.

What would make this complete? The system turns the request into early acceptance criteria so later work can be verified against something stable.

This is the first moment where the system stops being a chat assistant and starts becoming a delivery system.

Planning manufactures operating context

Planning is not overhead. Planning manufactures the context that makes autonomy and routing possible.

A plan inside a .md file is fragile because it doesn't produce structured machine-readable knowledge. A plan promoted into feature state becomes reusable operating context.

The planning step has three jobs.

First, it aligns intent. It separates facts, assumptions, open questions, and non-goals. It asks only the questions that change implementation.

Second, it structures execution. It maps requirements to a technical approach, breaks the work into tasks, identifies dependencies, and defines which files or surfaces each task is likely to touch.

The most important output is not the plan document.

The output is clean structured context that allows downstream activities to run as efficiently as possible.

Each model call should receive a focused briefing: the task goal, relevant requirements, accepted decisions, constraints, likely files, integration contracts, and verification steps.

That is what reduces context rot.

That is what makes providers interchangeable.

That is what makes cheap models usable.

That is what lets the system run longer without the user babysitting every step.

The plan is the context factory. Without it, every model call has to rediscover the work.

----

Ps*: I built a tool that embodies all the principles above (and much more that I left out to not write a poem). Happy to share more with anybody interested*

----

reddit.com

u/bralca_ — 16 hours ago

▲ 3 r/SaaSSolopreneurs+5 crossposts

Anthropic is going to charge 50X more for Claude Code on June 15th. You need to make your workflow provider agnostic. Here is Why (And How).

AI coding is built on two assumptions that will not hold forever:

Frontier intelligence feels cheap through flat subscriptions.
The user is assumed to be an engineer babysitting a chat agent.

Both are changing.

Chat-based tools are not the right architecture for that world.

The Elephant in the Room: Subscription Arbitrage

I analyzed my own usage over the last nine months. Priced as direct API consumption, it would have cost more than $500,000. Instead, I paid a few hundred dollars per month.

That gap changes behavior.

That works while the economics are subsidized by flat subscriptions.

It becomes fragile when the system has to face the real marginal cost of intelligence.

The Arbitrage Will Narrow

The arbitrage may not disappear overnight. Inference costs may continue falling. Open models may keep improving. Providers may preserve flat plans for some user segments.

But the unlimited-feeling version of frontier intelligence will narrow.

Maybe through stricter limits. Maybe through higher prices. Maybe through usage tiers.

The mechanism matters less than the direction.

AI coding will eventually have to care much more about where intelligence is spent.

Today, most AI coding discussion is about capability.

Which model writes better code? Which editor has the stronger agent? Which CLI can run longer? Which assistant feels smartest?

The post-arbitrage question is different: How do we allocate intelligence efficiently?

In that world, the expensive model becomes the escalation path, not the default runtime.

Cheaper models handle bounded work where the task is clear and verification can catch mistakes. Premium models handle ambiguity, architecture, deep debugging, integration risk, and final acceptance.

Current Tools Have the Right Primitives but State is Too Scattered

Current AI coding tools are improving fast.

They already expose many of the right primitives: repository access, file edits, shell commands, planning modes, memory, subagents, worktrees, hooks, cloud tasks, checkpoints, and resumable sessions.

Those primitives matter. They are the execution layer.

But execution is not the core problem anymore. The core problem is state.

Chat Is a Good Interface, but a Bad State Container

In most chat-based products, the conversation, thread, or agent run still acts as the source of truth.

Those pieces exist, but they do not form one durable artifact. They do not reliably talk to each other.

That is why the human quietly becomes the coordinator.

The user restates intent, pastes logs, corrects drift, reminds the model what changed, restarts failed runs, and decides whether the final result still matches the original request.

That works when AI is an assistant. It breaks down when AI becomes part of the delivery system.

The problem is not chat as an interface.

Chat is still useful for intent, clarification, review, and approval.

The problem is chat as the state container.

Chat Discovers Too Much While Spending

The perfect example to illustrate this point is the recent /goal release by Codex.

A user can give the agent an objective, and the runtime can continue working toward that goal across turns, with controls to create, pause, resume, and clear the goal.

That is a real improvement. It moves the tool closer to long-running autonomous work.

But it also exposes the next bottleneck.

A persistent goal is still not the same thing as a durable feature artifact.

That loop needs frontier intelligence end to end because too much of the work remains ambiguous during execution.

The system keeps spending while it is figuring out the shape of the work.

How the Intelligence Factory solves the problem

The Intelligence Factory would handle the same problem differently.

The long-running loop still exists, but it is no longer a dumb loop asking one frontier agent to keep pushing until the goal looks done.

It becomes an orchestrated production line: goal → feature seed → repo analysis → task graph → routed execution → verification → escalation if needed

The Intelligence Factory helps the system know what should happen next, who should do it, what context they need, how expensive the step should be, and how completion should be verified.

This is the lossy projection problem.

A smarter model inside the loop still inherits the constraints of the loop.

Why the Durable Artifact Is the Feature

By feature, I mean a bounded unit of software delivery: large enough to represent real user or business value, but small enough to plan, route, verify, recover, review, and merge.

A feature can be a new capability, a bug batch, a refactor, a migration, a performance pass, or a full-stack change.

The category matters less than the lifecycle. A feature has intent, scope, acceptance criteria, implementation work, verification, and a handoff or merge boundary.

That makes it the right durable artifact for AI coding.

Why not the Project?

The feature sits at the right level.

It is bounded enough to control context and cost. It is large enough to represent shipped value.

What the feature has to preserve

Treating the feature as the durable artifact does not mean creating a bigger spec.

It means preserving the state required to keep delivery coherent across models, providers, sessions, failures, and reviews.

A feature has to preserve four kinds of state.

Intent State

Execution State

Economic State

Trust State

Planning Is the Context Factory

The feature starts as a seed

The user should not need to write a perfect PRD.

A normal request should be enough.

The system’s first job is to turn that request into a feature seed: a small, structured starting point that makes the work actionable without pretending everything is already known.

A good feature seed answers three questions.

What is being changed? The system extracts the goal, expected behavior, visible constraints, and non-goals from the request.

What needs to be clarified? The system inspects the repository before asking questions. It should only interrupt the user for decisions that change scope, architecture, routing, or verification.

What would make this complete? The system turns the request into early acceptance criteria so later work can be verified against something stable.

This is the first moment where the system stops being a chat assistant and starts becoming a delivery system.

Planning manufactures operating context

Planning is not overhead. Planning manufactures the context that makes autonomy and routing possible.

A plan inside a .md file is fragile because it doesn't produce structured machine-readable knowledge. A plan promoted into feature state becomes reusable operating context.

The planning step has three jobs.

First, it aligns intent. It separates facts, assumptions, open questions, and non-goals. It asks only the questions that change implementation.

Second, it structures execution. It maps requirements to a technical approach, breaks the work into tasks, identifies dependencies, and defines which files or surfaces each task is likely to touch.

The most important output is not the plan document.

The output is clean structured context that allows downstream activities to run as efficiently as possible.

Each model call should receive a focused briefing: the task goal, relevant requirements, accepted decisions, constraints, likely files, integration contracts, and verification steps.

That is what reduces context rot.

That is what makes providers interchangeable.

That is what makes cheap models usable.

That is what lets the system run longer without the user babysitting every step.

The plan is the context factory. Without it, every model call has to rediscover the work.

----

Ps: I built a tool that embodies all the principles above (and much more that I left out to not write a poem). Happy to share more with anybody interested

----

u/bralca_ — 16 hours ago

▲ 4 r/AiBuilders+1 crossposts

I’ve been building Afkode because the models got good enough that I started feeling like I was the bottleneck.

They can write code. They can reason through a feature. They can fix a lot on their own. But for bigger features, there is still a lot of repetitive work around the actual building: keeping context from rotting, carrying knowledge forward, doing real planning, checking the implementation, and making sure the tests actually validate the thing.

I want to focus on the big thinking, then have the system handle the loop around it. That’s what Afkode is for.

Give it a complex feature.

It plans it, builds it, tests it, reviews it, and keeps going until it works.

What it does:

Runs multiple projects in parallel
Ships multiple features at different stages at the same time
Audits your test setup and adds what’s missing for real validation
Supports E2E tests, integration tests, and complex interaction flows
Lets planning, execution, and review use different models
Works with Claude, Codex, Kimi, Gemini, OpenCode, subscriptions, or APIs
Knowledge compounds: what execution learns, planning remembers

What you get:

Less agent babysitting
More product judgment
More shipped work

I built this for people who are already using agents in real dev workflows and want them to handle bigger features with less hand-holding.

Would love feedback, especially from anyone pushing coding agents past small tasks.

You can try it here: https://afkode.ai

u/bralca_ — 16 days ago

▲ 4 r/SpecDrivenDevelopment

TLDR

Why most SDD setups collapse, and a different bet that avoids it
A testing model where every requirement has a test before code gets written
How knowledge compounds from shipping, not from written specs
A context engine that keeps sessions lean instead of bloated
Running multiple AI features in parallel without stepping on each other

I've been building afkode for the past few months. Started with spec-kit, which is what got me thinking in specs in the first place. Great for 0→1, but after a while, the specs/ folder was a graveyard, some of it matched the code, some didn't, nobody was really sure.

So I stopped treating specs as documents. Five decisions came out of that, each one answering a problem I hit.

1. The code is the spec.

Planning artifacts are generated per feature, kept in a local DB, and thrown away after the feature ships. Nothing lives alongside the code except operational learnings (more below). Planning starts with a live pass over the current codebase analyzing pattern catalog, conventions, integration points, so every feature is grounded in whatever main looks
like today.

Why: a committed spec has two failure modes. Either it doesn't get updated (drift), or it does and now you have two sources of truth. Deleting the committed artifact removes both. You lose git blame on the spec, but nothing drifts, because nothing persists past the feature lifecycle.

2. Planning produces a graph, not a checklist.

Requirements carry IDs. Architecture components reference those IDs. The task graph has dependencies, coherence groups, and QC tasks placed where they matter.

Why a graph: at execution time each task needs a briefing built from only the parts of the plan that apply to it. A task in the auth module shouldn't be briefed with the marketing page. IDs and dependencies are what make that filtering possible.

3. Testing is 1:1 linked to acceptance criteria.

Every requirement carries an ID. Every test case is declared against one of those IDs. QC tasks sit in the task graph right after their integration tasks, so a component gets
tested as soon as it's wired up. Before planning is accepted, the system verifies every requirement has at least one test covering it. When execution finishes, every requirement has a passing test, and you can trace which test covers which requirement.

Why this structure: "we wrote some tests" and "we verified the acceptance criteria are met" are very different things.

4. Knowledge compounds from shipping.

Two artifacts get updated as every feature ships:

An operational journal — per feature, task by task. What was attempted, what worked, what didn't, what had to be backed out.
A testing knowledge base — project-wide. What tests exist, what patterns are used, what utilities are shared and what are the gotchas to be aware of.

Each new feature's planning reads both before generating anything. The requirements layer references integration points that already exist. New tests match patterns already in the project. Errors the agent made on feature 7 show up in the briefing for feature 12.

Why this compounds: the journal and knowledge base are written as a side effect of running. Each feature updates them by shipping. The next feature reads them fresh alongside main.

Nothing has to be maintained, because nothing is documentation — it's all operational record. The spec-as-document model needs a human to keep the doc current, which is a
chore that gets skipped. This one doesn't.

5. Fresh sessions backed by a context engine.

Every task runs in a clean context window. The briefing for that session is assembled at runtime from five sources:

The requirement(s) the task satisfies, pulled by ID from the specification layer.
The architecture components the task is expected to touch, pulled by the references carried on those requirements.
The file paths those components map to, with actual file content for direct-edit tasks.
Prior-task journal entries from this feature, filtered to only the earlier tasks that touched overlapping components.
Project-wide patterns from the testing knowledge base — nearby test patterns, shared utilities, existing coverage in adjacent files.

Each task ends up with a briefing of around 5–15k tokens. The window is mostly empty by design.

Why empty matters: context rot is a signal-to-noise problem, not a window-size problem. A longer window with more history makes retrieval worse. The model has more irrelevant
material to ignore on every token it emits.

The right briefing for task 6 is not "everything that happened in tasks 1 through 5." It's "the specific things from prior tasks that task 6 actually depends on."

The plan graph does the retrieval. Requirement IDs, architecture references, and task dependencies are explicit. If the plan says task 6 depends on task 3's output, task 6's briefing gets task 3's output. If the plan says task 6 doesn't touch the auth module, the auth code never enters the window.

Beta Testers Wanted

We just went live and are looking for beta testers, specifically people who actually run SDD day-to-day. Drop a comment or DM if you're in.

u/bralca_ — 28 days ago