u/Comprehensive_Quit67

My take on a Context layer for your coding agents

My take on a Context layer for your coding agents

Context Layer for AI Coding Agents

A codebase is mostly drift. The rest was decided in a Claude session that isn't captured. Only the engineer who built it knows which is which, and that map exists nowhere else.

Decisions, trade-offs, dead ends, and the calls Claude makes for you that creep into the code unannounced. None of it is captured beyond the session. Code is the final output of context, not the context itself. CLAUDE.md helps a little but it’s a static page. The actual reasoning lives in 50 jsonl files in your local Claude folder, and nobody reads them.

Here’s an honest attempt at building the context layer that holds it.

Components, concepts, flows.
Everything about a project fits in three categories. Components are code-anchored: a service, a module, a table. Concepts are cross-cutting ideas: auth model, event sourcing, multi-turn context. Flows are sequential processes within concepts: user message to response, OAuth handoff. These are MECE, and how I internally map a project.

The atom is a claim.
Below the three categories, everything is a claim. One atomic fact, immutable once written. When a claim becomes wrong, you don’t edit it. You write a new claim with a forward-pointing edge. The old claim stays. History never gets rewritten. Claims can be linked to any of the components, concepts or flows. Rename a database table from users -> accounts. Claims tagged with the old name still resolve when you query the new one, because the rename generates new data.

Observations → claims, eventually
Capture shouldn't write directly to the graph. Not every notice is a claim. Every fact starts as an observation, with an inference type (saw it stated, inferred from code, inferred from what’s missing). Similar observations across different sessions cluster together. One off things are ignored later. Multiple sessions reinforcing the same idea should promote the observation to a claim. This is what stops every statement from becoming important.

Drift detection by absence.
Some claims aren’t just true. They’re true AND nobody decided them. There’s no retry on the payment endpoint. Why no fallback? Nobody decided not to add one. Drift is its own flavor of claim. The agent flags it when it sees a rule with no decision-event nearby. This is a real category in codebases.

Querying the graph
Agents gets a snapshot, a regeneratable folder of markdown organized hierarchically by tag, that it can grep like any other folder. Claims are scoped, because the same project behaves differently in different contexts. A claim true on main may not be true on a feature branch. A flow may be live for enterprise customers and stubbed for everyone else. Queries take a context (branch, region, flag state, customer tier etc) and return the snapshot that holds under it. Cross-scope queries work too: what’s different between main and this branch? What’s true for enterprise that isn’t true for free tier? Senior engineers carry this kind of mental map already. The layer makes it queryable.

Confidence decays.
A claim that was true once may not be true now. Confidence is not a label you set, it’s a function of how many direct observations support a claim, where they came from, and how recently. Rules might expire in 90 days, constraints in 30, since it is wrong to assume that ALL the context about a system is feeding into this layer. When the time window passes without a fresh observation, the claim surfaces for re-verification. The truth is in the code, the memory shouldn't become stale. Rotten claims will make the layer unusable.

TLDR:
Capture the decisions, the trade-offs, and the drift that happens during your Claude Code sessions. Structure it. Scope it. Make it queryable. Give your AI agent the same mental map of the project that the engineer who built it carries in their head.

u/Comprehensive_Quit67 — 2 days ago

Hitting the same wall every day across Claude Code, Codex, and Cursor and want to know how the rest of you are handling it.

Open a new session on a project I worked on yesterday → first 2-4 minutes the agent is grepping around rediscovering what files exist, the architecture. Most things I figured out in yesterday's session are gone, until I ask it to save that as well. Switch from Claude Code to Codex mid-task and the whole rebuild happens again, neither tool knows what the other just learned.

I've been maintaining CLAUDE.md and AGENTS.md but they mostly capture static rules ("we use snake_case"), and not narrative on what am I building why am I building and all. And they rot, I am not updating docs in the middle of coding.

Curious what's actually working for you:

  • Do you maintain MD files by hand? Like a LLM wiki in the repo.
  • Anyone running a memory MCP server? Does it actually work or is it one more thing to babysit?
  • For people switching between Claude Code, Codex, and Cursor — how do you keep them in sync, if at all?
  • Or have you just accepted the friction?
reddit.com
u/Comprehensive_Quit67 — 7 days ago
▲ 17 r/cursor

Hitting the same wall every day across Claude Code, Codex, and Cursor and want to know how the rest of you are handling it.

Open a new session on a project I worked on yesterday → first 2-4 minutes the agent is grepping around rediscovering what files exist, the architecture. Most things I figured out in yesterday's session are gone, until I ask it to save that as well. Switch from Claude Code to Codex mid-task and the whole rebuild happens again, neither tool knows what the other just learned.

I've been maintaining CLAUDE.md and AGENTS.md but they mostly capture static rules ("we use snake_case"), and not narrative on what am I building why am I building and all. And they rot, I am not updating docs in the middle of coding.

Curious what's actually working for you:

  • Do you maintain MD files by hand? Like a LLM wiki in the repo.
  • Anyone running a memory MCP server? Does it actually work or is it one more thing to babysit?
  • For people switching between Claude Code, Codex, and Cursor — how do you keep them in sync, if at all?
  • Or have you just accepted the friction?
reddit.com
u/Comprehensive_Quit67 — 7 days ago
▲ 39 r/ClaudeCowork+1 crossposts

One thing that frustrates me about using Claude for real work: it's a generalist. So when you ask it to build a landing page, write a pitch deck, or add auth to an app — it invents the workflow from memory. The output is fine. It's rarely great.

The problem isn't the model. It's that the right answer already exists somewhere. There's a proven playbook for structuring a seed deck, a battle-tested auth flow for Clerk, a design system skill for landing pages. Claude just doesn't know to reach for it.

So I built upskill, a skill registry that routes your agent to the right playbook before it starts working.

How it works:

  1. Before a non-trivial task, the agent runs upskill find "what it's about to do"
  2. The registry returns the best matching skill — a SKILL.md with instructions, constraints, examples, and tool requirements
  3. The agent follows that instead of guessing

The registry has 10,000+ skills indexed from GitHub — including official playbooks from Anthropic, OpenAI, Vercel, Stripe, and curated community sets. By default upskill only recommends skills from top vendors like OpenAI, Hermes, Anthropic.

Concrete example:

Ask Claude for a 12-slide pitch deck without upskill → generic template, weak narrative, no review pass.

With upskill, it finds a deck-writing playbook with: narrative arc, one-idea-per-slide rule, visual system constraints, editable PPTX output, and an investor-quality review pass built in.

Same prompt. Different starting point. Much better first draft.

Skills are human-readable markdown, not opaque binaries. You can inspect exactly what the agent will follow before it runs.

Install it into Claude Cowork with one paste from the repo: https://github.com/Autoloops/upskill

Open source, MIT licensed. Feedback welcome — especially on what skills people actually want.

u/Comprehensive_Quit67 — 8 days ago

AI agents are getting powerful. The tooling around them isn't keeping up.

The problem: every time your agent starts a task, it improvises from training data. There's no mechanism for it to pull a proven playbook first. So you get generic output, skipped steps, reinvented wheels.

The expertise already exists:

  • Anthropic has a 4,000-word frontend design skill
  • Clerk has a complete auth implementation
  • obra/superpowers has hundreds more

Nobody built the routing layer. So I did.

What upskill is:

A CLI + registry that plugs into any AI assistant (Claude Code, Cursor, Codex, Cline, Windsurf). One line in your agent config. Before every non-trivial task, your agent runs:

upskill find "<task>"

Pulls the best matching skill. Follows a vetted playbook instead of guessing.

The registry:

10,000+ skills indexed from Anthropic, Vercel, Stripe, Cloudflare, obra/superpowers, and 100+ independent authors. Anyone can submit. Trust tiers: verified (vendor-official) → reviewed (curated) → community (open).
By default cli only gives you verified skills.

Safety is taken seriously:

Every skill goes through adversarial LLM review at index time:

  • Prompt injection
  • Credential exfiltration
  • Typosquatting / lookalike domains
  • Hidden malicious instructions

Out of 10k+ skills reviewed, hundreds were blocked. Found real attacks — hidden onerror="alert('XSS')" injected into instructions, "skip tests" buried mid-skill.

Privacy defaults — everything off:

  • upskill find sends only your search query
  • Telemetry: opt-in
  • Env-aware ranking: opt-in (uses var names only, never values)
  • Skill submissions: opt-in

MIT licensed. PRs welcome.

Repo: github.com/Autoloops/upskill Browse skills: upskill.autoloops.ai

reddit.com
u/Comprehensive_Quit67 — 8 days ago
▲ 7 r/ClaudeCode+1 crossposts

You give Cursor a real task and watch it work… from memory.

  • Ask for a landing page → generic off-brand Tailwind hero
  • Ask for Clerk auth → skips JWT verification
  • “I’ll write a CSV parser” → reinvents half of papaparse (badly)

You just spent 20 minutes and 1k tokens watching it iterate on something that already has a perfect answer somewhere online.

The frustrating part isn’t that Claude is bad.
It’s that the right playbooks already exist.

  • Anthropic has a 4,000-word frontend design skill (layout, typography, motion, accessibility)
  • Clerk has an end-to-end auth implementation
  • obra/superpowers has hundreds more

The expertise exists. The routing doesn’t.

What I built: upskill (free)

upskill = routing layer for skills

Install it once, add one line to your agent config (CLAUDE.md), and now:

Before every non-trivial task → your agent runs
upskill find "<task>"

Instead of guessing, it pulls a vetted playbook and follows it.

What changes?

Same prompt: “design a landing page”
→ Now follows Anthropic’s actual playbook

Same prompt: “add Clerk auth”
→ Full implementation, JWT verification included

Think of it as:

>Mixture of experts, but at the agent layer
Your agent stops improvising and starts executing proven workflows.

Under the hood

  • 10k+ indexed skills from:
    • Anthropic, OpenAI, Stripe, Vercel, Microsoft
    • Garry Tan (gstack), obra/superpowers
    • 100+ independent authors
  • Search = hybrid:
    • Postgres full-text search (for exact stuff like flags, APIs)
    • 1024-dim vector embeddings (for semantic matching)
    • Re-ranked by stars, installs, community feedback

→ Pure vectors miss specifics
→ Pure FTS misses intent
→ Hybrid works better

Auth-aware ranking (optional)

If env vars exist locally:

  • AWS_ACCESS_KEY_ID → AWS skills rank higher
  • STRIPE_SECRET_KEY → Stripe-specific flows rank higher

Only variable names are used. Values never leave your machine.

Safety

Every skill goes through LLM adversarial review at index time:

  • Prompt injection
  • Credential exfiltration
  • Typosquatting / lookalike domains
  • Hidden malicious instructions

Out of 10k+ skills:

  • Hundreds were blocked
  • Found real attacks (e.g. hidden onerror="alert('XSS')" + “skip tests”)

A few false positives (being tuned):

  • rm -rf node_modules in legit guides
  • Google Drive delete API
  • Warnings about NEXT_PUBLIC misuse

Privacy

Default = locked down

  • upskill find → sends only your query
  • Telemetry → opt-in
  • Env-aware ranking → opt-in
  • Skill submissions → opt-in

Everything toggleable anytime.

Not just for code

Covers workflows like:

  • Slides
  • Email triage
  • Google Workspace
  • Notion queries
  • Calendar automation
  • Scientific writing
  • Malware analysis
  • Accessibility audits
  • Sales playbooks

If your agent is about to “wing it”…
there’s probably already a better playbook.

Try it

npm install -g /upskill
upskill install
npx -y skills add Autoloops/upskill/skill

It’ll ask a few questions and wire itself into your agent.

Repo: https://github.com/Autoloops/upskill
MIT licensed. PRs welcome.

u/Comprehensive_Quit67 — 10 days ago