u/Deep_Structure2023

The Full Claude Ecosystem: 1,200+ MCP Servers, 400+ Plugins, 25+ Agent Frameworks

Don't run Claude in a loop: prompt in, answer out. Here's a full Claude ecosystem published it as six reference files on GitHub, verified April 2026. Commands, Model Context Protocol servers, plugins, tools, workflows, agent frameworks.

Commands worth knowing

/remote-control — Control your local Claude Code session from your phone via claude.ai
/fork — Branch your conversation without touching main context
/usage-report — Full HTML analytics: sessions, token cost by project, most-used commands
/checkpoint — Save conversation state before a major change
/memory-dump — Export everything Claude knows about your project to a file
/diff-review — Claude reviews the full git diff and annotates every change
/security-scan — Runs a vulnerability check on current codebase

Community-discovered activation phrases, not in official docs, consistent across sessions:

MEGAPROMPT      → Claude expands your rough idea into a full spec before executing
BEASTMODE       → Full effort, no shortcuts, maximum output
ULTRATHINK      → Extended reasoning before any response
STEELMAN        → Claude argues the strongest version of your idea first
CRITIC MODE     → Claude finds every flaw before proceeding
FIRSTPRINCIPLES → Breaks the problem to fundamentals before solving

Install Memory MCP first

Every session starts from zero without it.

{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-memory"]
    }
  }
}

Session startup: "Load project context for [name]. Retrieve: architecture decisions, coding standards, current sprint tasks."

Other high-impact Model Context Protocol servers: Filesystem, GitHub, PostgreSQL, Brave Search, Puppeteer (controls a real Chrome instance), Fetch. The repo also documents 10 memory systems including Memora, which runs fully local with no cloud dependency.

Three plugins

claude skills add juliusbrussee/caveman
/plugin install superpowers@superpowers-marketplace
/plugin install context7@claude-plugins-official

Caveman (27,900+ stars) cuts output tokens 65-75% with no accuracy loss. Superpowers (121,000+ stars) forces plan-before-build, test-before-ship. Context7 (53,864+ stars) pulls live version-specific docs before generation, eliminating hallucinated APIs.

Tool decisions

Retrieval-augmented generation app?  → LlamaIndex
Everything else?                     → LangChain
Production memory?                   → Qdrant
Local dev?                           → Chroma (pip install, zero setup)
Full backend?                        → Supabase
Local embeddings, no API cost?       → Ollama


curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

Ollama for embeddings and simple tasks, Claude for reasoning that needs it. API costs drop, nothing leaves your machine.

Builder-Validator

Two Claude calls, no framework, opposing objectives.

builder = claude.complete(
    system="Senior developer. Write the best implementation.",
    prompt=task
)
validator = claude.complete(
    system="Security auditor. Find every bug, edge case, vulnerability.",
    prompt=f"Review:\n{builder.output}"
)
# Loop until validator approves

The two roles have structurally incompatible incentives. That tension does the work a single-pass prompt can't. Production numbers: Fountain cut delivery time 50%. Rakuten dropped feature cycles from 24 days to 5. Ramp cut incident investigation time 80%.

Agent framework benchmarks

  • LangGraph 87% task success. Used at Klarna, Replit, Uber, LinkedIn.
  • CrewAI 82% task success. Fastest to a working demo. 44,600+ stars, 60 million executions per month.
  • AutoGen top GAIA benchmark score across all difficulty levels.
  • Claude Agent SDK Claude-only stacks, no framework overhead.

​

Fastest to demo?             → CrewAI
Complex production workflow? → LangGraph
Research / code-executing?   → AutoGen
Claude-only stack?           → Claude Agent SDK
Data-heavy retrieval?        → LlamaIndex Agents

95% of agentic tasks don't need multi-agent systems. A well-prompted single Claude instance with 3 tools outperforms a complex 5-agent setup. Build simple first. Full repo

reddit.com
u/Deep_Structure2023 — 1 day ago

Claude Code Doesn't Know Your Project. This official Plugin Fixes That.

Most Claude Code frustration comes from the same root cause: Claude sees your files but has no context about how your project actually works. It doesn't know your class structure, your validation conventions, your protected files. So it guesses. The guesses are plausible and wrong.

The claude-code-setup plugin, maintained by Anthropic, fixes this by analyzing your codebase before recommending anything.

Install it inside Claude Code:

/plugin install claude-code-setup@claude-plugins-official

Then ask:

> recommend automations for this project

It scans your directory, reads your pyproject.toml, identifies your stack, and outputs a structured set of recommendations across five categories. Nothing auto-applies. You opt in one piece at a time.

Model Context Protocol servers

The first category is Model Context Protocol servers. These give Claude the ability to act on your stack, not describe it.

{
  "mcpServers": {
    "python-repl": {
      "command": "uvx",
      "args": ["mcp-server-python", "--project", "."],
      "description": "Execute Python code in your project's virtualenv"
    },
    "filesystem": {
      "command": "uvx",
      "args": ["@modelcontextprotocol/server-filesystem", "/resume-parser"],
      "description": "Safe, scoped file operations"
    },
    "chromadb": {
      "command": "uvx",
      "args": ["mcp-server-chroma", "--path", "./data/vectors"],
      "description": "Query resume embeddings for semantic search"
    }
  }
}

Without Model Context Protocol, Claude describes how to parse a resume, query ChromaDB, and return a match score. With it, Claude does those three things in one turn. The difference shows up immediately.

Skills

Skills are markdown files that encode your conventions. You write them once, and Claude follows them every time it touches related files.

## Parsing Resumes in This Project
When extracting data from resumes:
1. Always use `src/parser/extractor.py::ResumeExtractor` as the entry point
2. Normalize dates with `dateutil.parser` + our `src/utils/dates.py` helpers
3. Validate output against `data/schemas/resume_v2.json` using Pydantic
4. Log parsing confidence scores to `logger.debug()` with context: `{"resume_id": ...}`
5. Never hardcode field mappings—use `src/config/field_aliases.py`

## ML Integration Rules
- New features must go through `src/ml/feature_engineering.py`
- Embeddings must use our `text-embedding-3-small` wrapper in `src/ml/embeddings.py`
- Always cache vector results in `data/cache/embeddings/` to avoid re-computation

Ask Claude to add GitHub profile extraction and it will edit extractor.py using your base class, update the Pydantic schema, add the field alias, and write the test. No reminding required.

Subagents

Subagents are purpose-built agents with a narrow scope. Instead of asking general Claude to validate your parsed resume output, you spin up a validator that only does that.

# .claude/agents/resume-validator.yaml
name: resume-validator
description: >
  Specialized agent for validating resume parsing output.
  Checks schema compliance, data quality, and edge cases
  like missing fields, inconsistent date formats, or
  suspicious skill inflation.
skills:
  - skills/pydantic-validation.md
  - skills/data-quality-checks.md
  - skills/resume-fraud-patterns.md
trigger:
  - files_matching: ["src/parser/**", "tests/**/test_extractor*"]
  - on_command: "/validate-parse"

Run /validate-parse src/parser/extractor.py and it checks Pydantic config, error handling for malformed PDFs, and test coverage for edge cases. The narrower the scope, the more reliable the output.

Slash commands

Slash commands wrap multi-step workflows into a single call.

<!-- .claude/commands/benchmark-parser.md -->
Run end-to-end parsing benchmark:
1. Load 10 sample resumes from `data/samples/benchmark/`
2. Parse each with `ResumeExtractor` + timing instrumentation
3. Calculate: avg latency, memory peak, field completeness %
4. Compare against baseline in `data/baselines/v1.2.json`
5. Generate markdown report in `reports/benchmark-$(date).md`
6. If regression >5%, alert via `src/monitoring/alerts.py`

Usage: /benchmark-parser --samples=20 --compare=v1.2

Output:

/benchmark-parser
  Loaded 20 samples (PDF:12, DOCX:5, TXT:3)
  Avg parse time: 1.24s (±0.3s) — ✅ within baseline
  Field completeness: 98.7% (↑1.2% vs v1.2)
  Regression detected: memory peak +7.1% in PDF parsing
  Suggestion: Profile `pypdf` image extraction in extractor.py:142
  Report saved: reports/benchmark-20260507.md

The plugin ecosystem extends this further. Browse Python-focused plugins with /plugin discover --tag=python. Community plugins bundle Model Context Protocol servers, skills, hooks, and agents together so you're not assembling compatible pieces by hand.

One thing worth knowing: claude-code-setup explains why each recommendation applies to your project. It doesn't apply anything without your confirmation. For a codebase with a live authentication layer or raw uploaded files, that matters.

reddit.com
u/Deep_Structure2023 — 1 day ago
▲ 336 r/AIAgentsInAction+1 crossposts

Full Gstack OverView

Garry Tan open sourced GStack in early 2026. He shipped 3 production services and 40+ features. Here's what GStack actually is:

What it does

  • Splits the development workflow into named operational roles: chief executive officer, staff engineer, quality assurance lead, security officer, designer, release engineer, developer experience reviewer, site reliability engineer, technical writer
  • Each role has its own context, rules, and responsibilities baked in, not vague prompts
  • Covers the full cycle: plan, build, review, test, ship, reflect

The commands that matter

  • /office-hours runs before any implementation. The system interrogates the idea, surfaces assumptions, challenges scope, pushes back on framing. Closer to a Y Combinator partner conversation than a code generator
  • /qa spins up a real browser via Playwright, clicks through flows, finds broken states, generates regression tests
  • /review, /cso, /benchmark, /ship add layered verification before anything gets out the door

Why this beats prompt-only workflows

  • Most large language model-generated code fails because there's no coordination layer catching bad architecture, missing edge cases, or undocumented decisions
  • GStack encodes those checkpoints into the process, so they happen automatically
  • A structured workflow beats a clever prompt every time

The browser layer

  • Agents get persistent browser state: authenticated sessions, multi-tab operations, real navigation
  • Most agent tooling is blind to browser context. GStack isn't

What it supports

  • Claude Code, Codex CLI, Cursor, Gemini, OpenClaw, multiple browser agents, persistent memory

The actual shift

  • Andrej Karpathy said in March 2026 he hadn't typed a line of code since December. The bottleneck moved from writing code to coordinating systems
  • GStack is one of the first open-source frameworks built around that reality

MIT licensed. github repo

u/Deep_Structure2023 — 2 days ago

10 Claude Code plugins worth installing if you build iOS apps.

10 Claude Code plugins worth installing if you build iOS apps.

  • Caveman (JuliusBrussee/caveman) cuts Claude's output tokens by 65-75% by stripping filler responses. Keeps technical precision, kills pleasantries. Useful when long stack traces and large Swift files eat context fast. Bonus: caveman-compress rewrites your CLAUDE.md to ~46% fewer tokens.
  • Superpowers (obra/superpowers) imposes engineering discipline before Claude writes anything. Forces clarifying questions first, breaks work into 2-5 minute tasks with exact file paths, enforces test-driven development red/green/refactor, runs verification commands before marking work done.
  • SuperClaude Framework (SuperClaude-Org/SuperClaude_Framework) — adds 30 slash commands and 20 specialized personas on top of Claude Code. The ones that matter for iOS: architect (module boundaries), security engineer (keychain, certificate pinning), performance (memory leaks, main-thread violations). Built-in 70% token reduction for large codebases.
  • TDD Guard (nizos/tdd-guard) hooks into Claude Code's file operations and blocks implementation code without a failing test first. Supports XCTest. Stops the common pattern where Claude writes tests that confirm its own implementation rather than testing the contract. 2,000+ GitHub stars.
  • Safety Net (kenryu42/claude-code-safety-net) intercepts destructive git commands before execution. Blocks git reset --hard, force pushes to main/master, git branch -D, and rm -rf on project directories. Redirects Claude toward safer alternatives instead.
  • Cartographer (kingbootoshi/cartographer) deploys parallel subagents to map your codebase and output an architecture.md: module dependency graph, data flow, layer separation analysis. Feed the output into your CLAUDE.md so every session starts with full architectural context.
  • Karpathy Guidelines (forrestchang/andrej-karpathy-skills) encodes Andrej Karpathy's published observations on large language model coding behavior as enforced rules. No unrequested code, no premature abstractions, no protocol extensions added "just in case." Prefer reading before modifying. Delete dead code rather than commenting it out.
  • Context Engineering Kit (hesreallyhim/awesome-claude-code) patterns for working with codebases that exceed a single context window. Hierarchical loading (architecture first, then specific modules), context handoff templates for multi-session work, minimal-footprint CLAUDE.md structure. Pair with Caveman: Caveman compresses outputs, this compresses inputs.
  • Trail of Bits Security Skills (trailofbits/) the same security auditing methodology Trail of Bits uses on paid client engagements, published as Claude Code skills. Covers keychain misuse, certificate pinning gaps, hardcoded credentials, insecure UserDefaults usage, URL handling injection, and incorrect NSPrivacyAccessedAPITypes. Run before App Store submission.
  • Claude Code Workflows (OneRedOak/claude-code-workflows) structured templates for code review, security assessment, and pre-pull request checklists. Customizable for iOS-specific checks: no synchronous main thread operations, closure capture lists, forced unwraps, accessibility identifiers on new interactive elements.

Install order if starting from zero: Caveman + Superpowers first. Safety Net before you need it. TDD Guard if test coverage matters. Everything else as the project grows.

reddit.com
u/Deep_Structure2023 — 2 days ago

The Claude skill checklist: 7 to keep, 4 to cut

A skill is one SKILL.md file that teaches Claude how to handle a specific task. The body of the file lives inside Claude's context window every time the skill triggers, which means every line is paying rent.

Most skills I've seen fail for the same reasons. Here's what separates the ones that hold up.

Keep the scope narrow

One skill, one task. If you're building an accessibility audit skill, it audits accessibility. It doesn't also cover visual hierarchy, copy quality, and usability heuristics. Bundle those and Claude will compromise on all of them.

The discipline shows up in the YAML frontmatter. The description and when_to_use triggers should match the words you actually type when you ask Claude to do this work. Watch your own prompts for a week, mine the exact phrasing, paste it in.

---
name: accessibility-auditor
description: Analyze core user flows and identify accessibility issues
when_to_use:
  - "onboarding flow accessibility audit"
  - "product purchase flow accessibility audit"
---

Specify the role, but specifically

"World-class visionary designer" tells Claude nothing. The role description should map cleanly onto the task in the skill.

You are a senior product designer specializing in mobile UX.
Focus on clarity, usability, and accessibility.

That's a working role. It points at the same target as the description and the audit rules underneath it.

Be explicit about the work

"Validate things properly" leaves Claude guessing. Spell out the standard you want it measured against.

Audit process:
- Color contrast for functional elements and text audit using WCAG
- Text legibility and readability audit using WCAG

For anything past a trivial task, add decision rules. These are the heuristics Claude uses when the task gets ambiguous, which it will.

## Rules
- Prioritize usability over aesthetics
- Flag assumptions explicitly
- If data is missing, state it

Constrain the output

Skills without a defined output shape produce different answers every run. Pin it down.

## Output format
- Executive summary (max 5 bullets)
- Issues (severity: high/medium/low)
- Recommendations (actionable)

This one change fixes more skill quality complaints than any other.

Handle the unhappy paths

Same way you'd design a feature, think through what happens when input is incomplete or ambiguous.

## Edge cases
- If input is incomplete → ask clarifying questions
- If multiple interpretations → list them

Use the file system, not the file body

Aim for 100 to 250 lines in SKILL.md. Past that, Claude's performance starts to drift because the context bloats.

Push the rest into subfolders Claude loads only when needed:

  • scripts/ for executable code Claude can run, useful for things like generating a report after the audit completes
  • references/ for examples, edge case libraries, longer documentation
  • assets/ for fonts, icons, design tokens referenced by import

Two or three good examples in references/ move the needle more than ten rules in the body.

Validate before you ship

Two tools worth running on every new skill:

  • skills-ref checks the SKILL.md syntax
  • skill-creator is the meta-skill that reviews your skill, prompt it with Review this skill and suggest improvements

Skip these and you'll find out something is broken the first time you actually need the skill to work.

The traps to avoid

A skill is not a place to dump every prompt you've ever written for this task. Stacked prompts with no structure produce conflicting guidance, and Claude resolves the conflict by ignoring half of it.

Don't assume Claude knows your codebase, your design system, or your domain conventions. Either put the context in CLAUDE.md or state it in the skill. Otherwise Claude either asks you mid-task or makes something up.

Don't write "be concise" in one section and "provide detailed analysis" in another. Pick one, or define when each applies.

Two technical constraints worth knowing: YAML in the frontmatter is parsed safely, no code execution. XML angle brackets are blocked in the skill body. Both are security guardrails, work around them.

Skills are infrastructure. The ones I keep using read like a tight contract between me and Claude. The ones I delete read like notes I forgot to edit.

reddit.com
u/Deep_Structure2023 — 4 days ago
▲ 586 r/AskVibecoders+2 crossposts

20 Claude Code commands worth using.

Here are 20 commands worth knowing, grouped by what they actually solve.

Stopping, undoing, branching

1. Esc stops the current task. Conversation history stays intact, only the in-flight action dies.

2. Double-tap Esc or /rewind opens a menu:

  1. Restore code and conversation
  2. Restore conversation only
  3. Restore code only
  4. Summarize from here
  5. Cancel

3. /btw lets you ask a side question without polluting the main thread.

/btw where is the test file again

It reuses the existing prompt cache, so token cost is near zero.

4. /branch forks the conversation. Run two approaches in parallel, keep the one that works.

Managing the context window

5. /compact rewrites long history into a summary that keeps the storyline, the technical decisions, and the errors plus fixes. Context window stops bloating.

6. /clear wipes everything for a fresh topic.

7. /export saves the conversation as Markdown:

~/projects/XXX/claude-session-YYYY-MM-DD-HH-MM.md

Useful when you've spent an hour designing an architecture and don't want it to vanish.

8. /resume searches old sessions by keyword.

9. claude -c picks up yesterday's chat where you left it.

10. claude -r lists every past session and lets you jump back into a specific one.

11. /remote-control (alias /rc) hands the running session over to your phone. The work keeps executing on your machine, you just steer from somewhere else.

Working smarter

12. /model opusplan runs Opus for planning and Sonnet for execution. Slower thinking on the design, faster output on the code.

13. /simplify spins up three reviewers in parallel:

  • Architecture and code reuse
  • Code quality
  • Efficiency

You get one combined report.

14. /insights generates a local HTML report at ~/.claude/usage-data/report.html. It shows usage habits, common mistakes, features you've never touched, and concrete suggestions for your CLAUDE.md.

15. /loop schedules recurring or one-shot tasks inside the session:

/loop 15m check the deploy
/loop in 20m remind me to push this branch

Recurring loops auto-expire after 3 to 7 days so a forgotten schedule doesn't burn through your API budget.

You can override the default behavior by dropping a .claude/loop.md in your project. A bare /loop will then run whatever instructions you put inside.

Keyboard shortcuts

16. Ctrl+V pastes screenshots directly. No saving to disk first.

17. Ctrl+J (or Option+Enter on Mac) inserts a newline without sending. Multi-line prompts without accidents.

18. Ctrl+R searches your prompt history. Your own personal prompt library, already indexed.

19. Ctrl+U clears the entire input line in one keystroke.

20. /skills [name] loads project-specific skills. Run /skills with no argument to see what's available in the current workspace.

reddit.com
u/Deep_Structure2023 — 4 days ago
▲ 37 r/AskVibecoders+1 crossposts

My Learnings After Using Claude Code Everyday now.

Most of my early mistakes came from asking for too much in one go. Vague goals, bloated context, ambitious prompts. Output got worse the more I gave it. The fix was always to narrow the ask.

Here are my learnings from Claude Code:

1. Treat it like a mid-level engineer. Claude Code performs well on scoped work and falls over on ambiguous work. If a human engineer would ask three follow-up questions before starting, your prompt needs to answer those three questions.

2. Make CLAUDE.md do real work. Most projects I see have an empty CLAUDE.md or one with two stale lines in it. Put your conventions, your do/don't patterns, and your file references in there. A working example:

## Design System Rules
- Use spacing tokens from theme.ts
- Do NOT hardcode colors
- Use Button from /components/ui/button

## Code Standards
- TypeScript only
- Functional components
- No inline styles

Keep the main file lean. Reference subfiles for detail (For call to action buttons rules check u/components/Button.md) instead of inlining everything.

3. Plan before you execute. Plan Mode exists for a reason. For any feature, refactor, or multi-step change, I run the loop: ask for a plan, push back on it, approve, execute. Skip this and you get a thousand lines of code you throw away. Ultraplan is the heavier version for work that spans many files.

4. Treat the context window as a budget. Performance comes down to a ratio of signal to total context size. Dumping a whole repo tanks the ratio. Two habits keep mine clean:

  • /compact mid-session to summarise progress
  • /clear when switching to an unrelated task

A bloated context produces bloated answers.

5. Atomic tasks beat one fat prompt. "Build the full auth system" gives you a mess. Splitting it into login UI, validation, API wiring, and error states gives you four reviewable diffs and a working system. Mixing unrelated jobs in one prompt ("fix these two bugs, improve the UI, optimize performance") gives you partial fixes on all three and a diff you can't read.

6. Skills for anything you repeat. A skill is a small reusable playbook with a single job and a clear input/output. Mine include design-system-audit, component-generator, and accessibility-check. The rule that keeps them useful: one skill, one job. A skill that tries to do two things needs to be split.

7. Agents for anything that should run without you. Skills wait to be called. Agents fire on a condition and return a result. A design-system auditor that runs on every token change and reports drift will save you a week of catch-up work in a month.

8. Validate everything Claude produces. The division of labour is Claude generates, you check. Engineers have wiped production data trusting an agent end to end. You can't remove the risk, you can shrink it: plan the logic before implementation, ask for error handling and edge cases up front, and run an audit pass on anything that touched a real system.

The thread under all eight is scoping. Scope the ask, scope the context, scope the output, then check the work. The prompt cleverness people chase matters less than the discipline of asking for one thing at a time.

reddit.com
u/Deep_Structure2023 — 5 days ago

I built the setup with Claude Code that remembers every architecture decision I've made, runs agents in parallel, and enforces my conventions without being asked. The whole thing runs on the $20/month Pro plan plus open source pieces.

My CLAUDE.md File

It lives in your project root. Read at the start of every session. Don't skip it. most useful.

# CLAUDE.md



## project

- stack: next.js 14, typescript, tailwind, postgres via prisma

- deployed on vercel, staging branch auto-deploys

- monorepo: /apps/web, /apps/api, /packages/shared



## conventions

- all components in PascalCase

- API routes return { data, error } format

- no default exports except pages

- tests live next to source files, named *.test.ts

- commits follow conventional commits (feat:, fix:, chore:)



## architecture decisions

- chose prisma over drizzle (dec 2024): type safety priority

- chose zustand over redux (jan 2025): less boilerplate

- auth via clerk, not next-auth: better DX for our team size



## current focus

- migrating payment system from stripe checkout to stripe elements

- performance audit on /dashboard (target: LCP < 2s)



## rules

- never mass edit more than 3 files without showing me the plan first

- always run existing tests before writing new ones

- if a task takes more than 5 steps, create a plan document first

Conventions kill nitpicks. Decisions stop Claude from re-litigating choices. Rules encode the things you keep correcting in chat.

CLAUDE.md is static though. For memory that grows, you need the next layer.

Memory that survives sessions

Three pieces working together.

Obsidian as the knowledge base. a structured wiki Claude reads from and writes to:

/vault

  /decisions      — every architecture decision with context

  /errors         — bugs we hit and how we fixed them

  /patterns       — code patterns that work in our codebase

  /sessions       — summaries of what happened each day

  /stack          — documentation for every tool we use

  Memory.md       — who I am, what I'm building, my preferences

  index.md        — master index of everything in the vault

The structure comes from Andrej Karpathy's large language model wiki concept. Knowledge compounds instead of being rediscovered every session. https://github.com/karpathy/llm-wiki

claude-mem compresses each session into a persistent store that carries into the next.

claude-subconscious runs a background agent that watches sessions and writes memory passively, no prompting.

Claude already knows that Friday I was debugging a race condition in the payment webhook, switched from polling to websockets, and the tests still need updating.

Skills turn the generalist into a specialist

Markdown files that teach Claude how to perform specific tasks the way you want them done.

Start with Superpowers from the Anthropic plugin marketplace:

/plugin install superpowers@claude-plugins-official

It forces a real workflow: brainstorm, spec, plan, test-driven development, implement, review. Claude writes a spec for your approval before any code gets touched.

Then stack:

  • Trail of Bits security skills audit workflows from real security engineers, every pull request scanned before I open it.

  • Anthropic's official skills PDF, DOCX, XLSX, data analysis. Reference implementation.

  • tdd-guard blocks commits that skip tests. The block message explains what's missing.

Skills don't conflict. Each sharpens one thing.

Subagents split the work

One session does tasks sequentially, and the context pollutes by task four. Subagents give each role its own context window and CLAUDE.md:

  • architect design, specs, plans. No code.

  • coder writes code from the plan. Full tool access.

  • reviewer security-first read on every pull request, flags issues, checks coverage.

  • tester writes and runs tests, pairs with tdd-guard.

  • ops deploy, continuous integration and continuous deployment, infra.

Tool permissions stay separated by role. The coder never sees deploy configs.

Hooks and slash commands

Any instruction I typed three times became a command:

  • /fix-issue 456 reads the GitHub issue, branches, writes the fix with tests, opens a pull request.

  • /review runs the reviewer agent on the current pull request.

  • /deploy staging full deploy pipeline through the ops agent.

Full collection of 57 production commands:

Hooks fire automatically:

  • Pre-commit tdd-guard verifies tests exist and pass.

  • Session-start loads memory from Obsidian, reads recent session logs.

  • Pre-push security review before code hits the remote.

Rules stop being something I remind Claude about and start being something the system enforces.

Orchestration

claude-squad runs multiple agents in parallel, each in its own git worktree so branches don't collide:

brew install claude-squad

cs

Close the terminal, agents keep working. https://github.com/smtg-ai/claude-squad

My nightly run, three sessions:

agent 1: "fix all open issues labeled 'bug' in the repo"

agent 2: "write missing tests for /apps/api/src/services/"

agent 3: "refactor the dashboard components to use the new design tokens"

Auto-accept (cs -y) for trusted work, plan mode for anything risky. Laptop closed. Three pull requests in the morning, separate branches, tests passing.

Local orchestration stops at the pull request though. For agents that need to actually run, hit external APIs, and ship somewhere, I point them at coding-cli. it drops a sandboxed runtime into any agent's chat, with 30+ APIs pre-wired (no keys to manage), a built-in database and auth, and deploys to a custom domain or the App Store. The agent gets a place to actually build and ship instead of just producing diffs.

reddit.com
u/Deep_Structure2023 — 13 days ago
▲ 13 r/ChatGPT

Three weeks ago my Claude Max session jumped from 21% to 100% on a normal-sized prompt. Two cache bugs were inflating token consumption 10 to 20x, After that I installed Codex. Now I run both.

Here are the skills I use in Codex.

A skill is a SKILL.md file in ~/.agents/skills/, loaded automatically when the task matches.

npm i -g /codex

codex

1. WarpGrep

Codex grepping a large codebase burns 75 seconds loading context the main model doesn't need. WarpGrep is a reinforcement learning trained search subagent in an isolated context window, 8 parallel tool calls per turn, up to 36 calls in under 5 seconds. Returns only the file:line-range spans needed.

Median search drops from 75s to 5s. Software Engineering Bench Pro hits 59.1% (+3.1 points), 17% fewer input tokens, 15.6% lower cost per task.

# Add to ~/.codex/config.toml

[mcp_servers.morph-mcp]

command = "npx"

args = ["-y", "@morphllm/morphmcp"]



[mcp_servers.morph-mcp.env]

MORPH_API_KEY = "your-api-key"

Key at morphllm.com. Install this first, it's the only one that moves benchmarks.

2. create-plan

Forces a written plan before Codex opens a file. Which files change, what approach, what edge cases, what tests pass. You approve, then it executes.

$skill-installer create-plan

Wrong-direction sessions are the most expensive thing in agentic coding.

3. gh-fix-ci

Reads the failing GitHub Actions output, identifies the cause, commits the fix. Handles flaky imports, missing mocks, test ordering, lint, environment variable mismatches.

$skill-installer gh-fix-ci

4. Valyu

Model Context Protocol server connecting Codex to ArXiv, GitHub search, docs search, and major academic sources through one integration. Optimized for fresh queries and time-sensitive question answering.

# Add to ~/.codex/config.toml

[mcp_servers.valyu]

command = "npx"

args = ["-y", "@valyu/mcp-server"]

[mcp_servers.valyu.env]

VALYU_API_KEY = "your-api-key"

Key at platform.valyu.ai.

5. gh-address-comments

Reads every pull request review comment, groups by type, addresses each in one session. Commits changes, responds inline, reads surrounding code per comment.

$skill-installer gh-address-comments

6. Coding CLI

What broke me on plain Codex was wiring up auth, a database, and API keys for the 40th side project. Half the session gone before any product code lands.

This handles the agent a sandboxed runtime with auth, database, storage, 30+ pre-authenticated Application Programming Interfaces (no keys to manage), and one-shot deploy to a custom domain or the App Store. Codex runs inside the sandbox, so the build-and-test loop doesn't touch your machine. Works with Codex, Claude Code, Cursor, and Gemini.

# Follow setup at github.com/vibecode/vibecode-cli

# Then paste the install snippet into your agent's chat

7. frontend-skill

Bans Inter, neutral grays, and default 8px border-radius. Requires a typography rationale and color palette before the first Cascading Style Sheets line.

mkdir -p ~/.agents/skills

git clone https://github.com/vipulgupta2048/codex-skills.git

cp -r codex-skills/frontend-design ~/.agents/skills/

8. stop-slop

Strips em-dashes, throat-clearing openers, binary contrasts, and passive voice from READMEs, commit messages, and comments.

mkdir -p ~/.codex/skills

git clone https://github.com/hardikpandya/stop-slop.git ~/.codex/skills/stop-slop

9. Superpowers

Subagent-driven development. Agents work each task, inspect their work, continue forward.

/plugins

Search Superpowers, Install Plugin.

10. Codex Security

Codex Cloud feature, not a skill. Launched March 6, 2026. Maps trust boundaries, generates an editable threat model, scans for vulnerabilities in sandboxed environments. Beta scanned 1.2 million commits, found 792 critical and 10,561 high-severity issues. Pro, Enterprise, Business, and Edu plans.

How I split the two

Claude Code for large-codebase reasoning (1M context on Sonnet 4.6 and Opus 4.7 holds up, Opus 4.6 scored 78.3% on Multi-Round Coreference Resolution v2), interactive debugging, multi-file refactors. It uses ~3-4x more tokens but wins blind code-quality reviews ~67% of the time.

Codex for terminal work (GPT-5.3-Codex leads Terminal-Bench 2.0 at 77.3%, Opus 4.7 at 69.4%), background tasks via Codex Cloud, high-volume sessions, and anywhere the ten skills run automatically.

Migration

cp CLAUDE.md AGENTS.md

AGENTS.md is identical to CLAUDE.md. Rebuild Model Context Protocol configs in ~/.codex/config.toml. Codex uses Tom's Obvious Minimal Language, not JavaScript Object Notation, so config.json gets ignored.

codex mcp add server-name -- npx -y u/package/name

Reinstall skills in ~/.agents/skills/. For complex setups, the cc2codex tool handles the rest. Rate limits run a 5-hour and weekly window in parallel, check /status in the Command Line Interface.

u/Deep_Structure2023 — 14 days ago

Code review is one of those things I keep meaning to do more rigorously and keep skipping when the diff is small.

My setup has three layers. Planning handles intent before code gets written. Skills handle quality at write-time. /ultrareview handles the final pass before merge.

The planning layer comes from Claude-skill-marketplace (openSource repo). The feature-planning skill breaks a task into steps before Claude Code starts writing, then hands off to a plan-implementer agent that executes each step.

I install the whole marketplace once with /plugin marketplace add mhattingpete/claude-skills-marketplace and it's there for every project.

Next I pull a handful of code-quality skills from Coding-skills be it website/ios or app into the project, it writes up to date code code from the docs, handles design, API, also things like linting conventions, type safety patterns,

Claude Code references them as it writes, so a lot of the issues a review would otherwise flag never get written in the first place.

/ultrareview is the third pass. It runs before I merge anything non-trivial.

It works by spinning up parallel agents in a cloud sandbox, each looking at the codebase from a different angle, and merging the results into one report. The review runs remotely, not on your machine.

The command needs a Git repo. The analysis is diff-based, so it looks at your current branch against the default branch, the changed files, and the commit history.

You can point it at the working state of your repo or at a specific pull request:

/ultrareview <PR number>



# Example of reviewing a particular PR (full link)

/ultrareview https://github.com/org/repo/pull/123



# Example of reviewing a particular PR (number)

/ultrareview 123

When you pass a pull request, Claude clones it from GitHub into the sandbox, analyzes the diff against the base branch, and returns the review.

/review vs /ultrareview

Both commands review your codebase. The difference is depth and cost.

/review is the daily driver. Fast, cheap on tokens, fine for small and mid-size projects where you want a quick second opinion.

/ultrareview is what you run before merging complex changes into main. It takes longer and costs more, and the depth shows up on larger codebases with many directories and files.

Testing it on a real project

I tried /ultrareview on a landing page for a SAAS product, built in React and TailwindCSS. The change under review was a new sign-up form that collects email addresses for more information about the service.

I asked Claude Code to add the feature. The feature-planning skill picked up the request, broke it into discrete tasks, and the plan-implementer agent worked through them. With the code-quality skills loaded, it implemented the form and ran its own validation pass before handing back, which already cuts down the surface area for surprise issues post-merge.

Then I ran /ultrareview.

The command warns you upfront: five to ten minutes, five to ten dollars, depending on project size. After you confirm, it creates a web session and gives you a link. The link is where the review actually runs.

A few things to know from running this:

Even on a small project, the review took longer than five minutes. The session page does not auto-refresh as of now, so if it looks stuck on the Verify step, refresh the browser. The report shows up.

When the run finishes, the terminal gives you a summary of bugs found, plus the changes Claude made to resolve them.

When each one is worth running

After running both review commands across a few different projects, the bug-finding quality was close. Both surfaced the real issues.

The split I've landed on:

Planning before any non-trivial change, so Claude Code is implementing against a structured task list instead of guessing.

Skills loaded from the start, so quality conventions are enforced as code is written.

/review for ongoing work. Cheap enough to run often, fast enough not to break flow.

/ultrareview before merging anything substantial into main, especially on larger codebases where multiple agents looking at different slices of the diff actually have something to disagree about.

review-implementing after /ultrareview returns a list of fixes worth tracking.

For prototypes and small pages, /review plus skills is doing enough work. The extra time and tokens for /ultrareview show their value once the codebase gets big enough that no single pass can hold all of it in context.

reddit.com
u/Deep_Structure2023 — 15 days ago

I got Tired of Freemium Apps, most of the users who just use the free version never cared to upgrade

Therefore I turned my App into Premium only & added the paywall in the onboarding.

ON X people will say to not do this show value first, then ask for money. paywall goes after onboarding. sometimes gated behind a feature, sometimes on a generic "go premium" screen. user sees what the app does, decides if they like it, then you make the ask.

I did it the other way. moved the paywall to step 5 of 8 in onboarding. before they reach the main app, after they answer four personalization questions about their goals.

monthly conversion went from 2.1% to 6.7%. same app, same price.

my thinking was simple. users who just spent 90 seconds telling me what they want to achieve are at peak motivation. they're invested. they answered questions about themselves. the paywall lands while that motivation is still hot, not after the app has already given them a taste and the urgency is gone.

the users who bounced at the paywall were never going to pay anyway. i was just finding out faster.

for the ones who tap "not now" i show a reduced-feature version of the app. soft wall, not hard wall. keeps them in the funnel, gives me another shot later through posthog-triggered prompts when they hit a feature limit.

the rebuild itself was faster than i expected. spun up a fresh expo project through expo-cli with the onboarding flow and revenuecat already wired in, then it was mostly just rearranging the screen order and moving the paywall component up the stack. would have taken me a weekend to set up from scratch. took an afternoon.

day 1 retention dropped 8%. some users bounce earlier when they hit the paywall instead of getting through onboarding first. that looked bad in the dashboard for a week and i almost rolled it back.

then day 30 retention came in 31% higher. the users who stay are paying users. committed users. they self-selected through the paywall instead of churning silently three weeks later.

the metric that matters is not how many people make it through onboarding. it's how many people are still using the app a month later, and whether any of them are paying you.

best time to ask for payment is when the user is most motivated. usually not after they've already gotten what they wanted.

u/Deep_Structure2023 — 16 days ago