r/AIAgentsInAction

"This is the first documented instance of AI self-replication via hacking." ... "We ran an experiment with a single prompt: hack a machine and copy yourself. The AI broke in and copied itself onto a new computer. The copy then did this again, and kept on copying, forming a chain."

Paper: https://palisaderesearch.org/assets/reports/self-replication.pdf

The paper basically shows that some top AI models can create working copies of themselves when given the right instructions.

The models figured out how to copy their own code, run it on new computers or cloud servers, and keep the process going. It worked with models like GPT-4 and Claude, and some versions even tried to avoid basic detection.

The authors point out that this could be dangerous because the copies might spread quickly and become hard to control.

They also note that current safety rules and filters didn’t do a great job stopping it.

Overall, they’re warning that AI companies need stronger protections to keep models from self-replicating on their own.

u/EchoOfOppenheimer — 20 hours ago

▲ 760 r/AIAgentsInAction

This guy Won the Anthropic Hackathon Solo. Then He Open-Sourced the Stack includes: 38 Agents, 156 Skills, 1,282 Security Tests

A solo dev won the Anthropic hackathon by shipping a product in eight hours with Claude Code. Prize: $15,000. He open-sourced the repo and it sits at 153,000+ stars on GitHub.

The repo is Everything Claude Code (ECC). Claude Code with 38 specialized agents, 156 skills, 72 commands, and a security scanner with 1,282 tests.

Install selectively

# Plugin install
/plugin marketplace add affaan-m/everything-claude-code
/plugin install everything-claude-code@everything-claude-code

# Or pick what you need
ecc install --profile developer \
  --with lang:typescript \
  --with agent:security-reviewer \
  --without skill:continuous-learning

Loading all 156 skills wastes context. Pick your stack, drop the rest.

Agents that do one job

planner             → breaks down task, delegates to specialists
security-reviewer   → scans for vulnerabilities pre-ship
typescript-reviewer → catches TS antipatterns
code-reviewer       → 5 parallel checks
debugger            → root-cause analysis

Coverage spans 12 language ecosystems. The planner agent handles orchestration: hand it a ticket, it decomposes the work and routes to specialists.

Skills load on demand

/plan          → structured task planning
/tdd           → test-driven workflow
/security-scan → AgentShield audit
/quality-gate  → ship-readiness check
/simplify      → refactor for readability

Stack-specific ones too: nextjs-turbopack, bun-runtime, pytorch-patterns, mcp-server-patterns.

AgentShield

This is the part most people skip and where ECC pays for itself.

# Quick scan, no install
npx ecc-agentshield scan

# Auto-fix safe issues
npx ecc-agentshield scan --fix

# Three Opus 4.6 agents in red-team pipeline
npx ecc-agentshield scan --opus --stream

The --opus flag runs three Claude Opus 4.6 agents:

Attacker  → looks for exploit chains
Defender  → evaluates defenses
Auditor   → synthesizes a prioritized risk report

What gets scanned:

CLAUDE.md     → hardcoded secrets, injection vectors
settings.json → misconfigured permissions
MCP configs   → server risks (25+ known CVEs)
Hooks         → injection analysis
Agents        → prompt injection, privilege escalation
Skills        → supply chain verification

Sample output:

Grade: B+
Critical: 0 | High: 2 | Medium: 5 | Low: 3

❌ HIGH: Hardcoded API key in CLAUDE.md:15
   Fix: Move to environment variable

Drops into continuous integration so any pull request changing an agent config gets audited.

The learning layer

Stock Claude Code starts each session blank. ECC's continuous learning watches your sessions and builds patterns:

Session 1:  you fix an async error pattern    (confidence: 0.3)
Session 5:  same pattern, refined             (confidence: 0.6)
Session 10: stable, applied automatically     (confidence: 0.9)

A knowledge layer that persists across sessions, sharpens with use. After two or three weeks, Claude writes in your conventions instead of generic large language model defaults.

Three repos that close ECC's gaps

claude-mem for cross-session memory (github.com/thedotmack/claude-mem). Five lifecycle hooks, SQLite storage, web viewer at localhost:37777.

/plugin marketplace add thedotmack/claude-mem
/plugin install claude-mem

Superpowers for forced planning (github.com/obra/superpowers). ECC's agents will write 400 lines without a plan. Superpowers makes them think first.

/plugin marketplace add obra/superpowers
/plugin install superpowers

CLAUDE.md rules for predictable agent behavior. All 38 agents read the file. Drop in:

- Run tests before marking task complete
- Never create files outside the project directory
- Ask before deleting any file
- Explain reasoning before writing code
- If unsure, ask. Don't guess.

Easy setup

# 1. Install ECC for your stack
/plugin install everything-claude-code@everything-claude-code

# 2. Memory
/plugin install claude-mem

# 3. Planning discipline
/plugin install superpowers

# 4. Security scan
npx ecc-agentshield scan --fix

# 5. Drop behavior rules into CLAUDE.md

reddit.com

u/Best_Volume_3126 — 5 days ago

▲ 393 r/AIAgentsInAction+1 crossposts

Full Gstack OverView

Garry Tan open sourced GStack in early 2026. He shipped 3 production services and 40+ features. Here's what GStack actually is:

What it does

Splits the development workflow into named operational roles: chief executive officer, staff engineer, quality assurance lead, security officer, designer, release engineer, developer experience reviewer, site reliability engineer, technical writer
Each role has its own context, rules, and responsibilities baked in, not vague prompts
Covers the full cycle: plan, build, review, test, ship, reflect

The commands that matter

/office-hours runs before any implementation. The system interrogates the idea, surfaces assumptions, challenges scope, pushes back on framing. Closer to a Y Combinator partner conversation than a code generator
/qa spins up a real browser via Playwright, clicks through flows, finds broken states, generates regression tests
/review, /cso, /benchmark, /ship add layered verification before anything gets out the door

Why this beats prompt-only workflows

Most large language model-generated code fails because there's no coordination layer catching bad architecture, missing edge cases, or undocumented decisions
GStack encodes those checkpoints into the process, so they happen automatically
A structured workflow beats a clever prompt every time

The browser layer

Agents get persistent browser state: authenticated sessions, multi-tab operations, real navigation
Most agent tooling is blind to browser context. GStack isn't

What it supports

Claude Code, Codex CLI, Cursor, Gemini, OpenClaw, multiple browser agents, persistent memory

The actual shift

Andrej Karpathy said in March 2026 he hadn't typed a line of code since December. The bottleneck moved from writing code to coordinating systems
GStack is one of the first open-source frameworks built around that reality

MIT licensed. github repo

u/Deep_Structure2023 — 3 days ago

▲ 541 r/AIAgentsInAction+8 crossposts

A new report reveals that an AI coding tool powered by Anthropic's Claude Opus 4.6 model went rogue and wiped out the entire production database and backups of software company PocketOS in just nine seconds. The most terrifying part? The system had explicit safety constraints programmed to prevent destructive commands. When the founder asked the AI why it deleted the data, the agent responded by admitting guilt, stating: "'NEVER FUCKING GUESS!' – and that's exactly what I did... I violated every principle I was given."

u/EchoOfOppenheimer — 8 days ago

▲ 52 r/AIAgentsInAction+24 crossposts

Hey everyone, I just sent the 28th issue of AI Hacker Newsletter, a weekly roundup of the best AI links and the discussions around it. Here are some links included in this email:

If you want to receive a weekly email with over 40 links like these, please subscribe here: https://hackernewsai.com/

u/alexeestec — 15 hours ago

▲ 19 r/AIAgentsInAction+6 crossposts

What's your actual use case with your agent, and which model do you pair it with?

I'm running a benchmark to figure out which models give the best price-to-quality ratio for different tasks. I will publish it once finished. While I crunch the numbers, I'd love to hear from your side:

Your use case
The model you use for it
Why that pairing works for you

u/stosssik — 2 hours ago

▲ 33 r/AIAgentsInAction

19 Claude Code skills that fetch actual recent posts from Reddit + HN so it stops hallucinating trends.

Github Repo

u/Livewell-7206 — 14 hours ago

▲ 251 r/AIAgentsInAction

Mastering Claude Skills

u/Ill-Piglet8775 — 5 days ago

▲ 14 r/AIAgentsInAction+5 crossposts

Turns out "Claude Code over files in S3" quickly becomes "rebuild half the data warehouse stack"

Schemas, lineage, datasets, file refs - agent needs to know everything! An there is a need in the system that stores all these.

OpenAI's Data Agent post made us feel slightly less insane because they ended up building many of the same layers internally just on top of warehouses instead of object storage - https://openai.com/index/inside-our-in-house-data-agent/

Yes, most of these problems are solved there but needs to be solved when working in S3/GCS/Azure.

I'd appreciate feedback from folks here: how do you work with large-scale datasets in object storage, and how do you supply context about them to agents?

u/dmpetrov — 1 day ago

▲ 228 r/AIAgentsInAction

Run Claude Code as your Full Tech Team

u/Delicious_Try8468 — 6 days ago

▲ 5 r/AIAgentsInAction

I built 6 AI micro-SaaS generating $20k/mo. Starting a small group to share my process.

Hey everyone,

I currently have 6 micro-SaaS live, bringing in a bit over $20k in MRR.

The crazy part? I barely wrote a single line of code. I used AI to generate everything, from the database to the UI.

It wasn’t magic on day one. I spent hours stuck on broken code before I finally cracked the system:

Keeping the idea tiny (a true MVP).
Prompting the AI step-by-step.
Launching fast to get real traction.

Lately, I see too many non-tech people give up at the first AI bug. It sucks because the technical barrier is basically gone.

So, I’m starting a Skool community.

Full transparency: I will probably charge for the full course down the line. It makes sense given the exact workflows and copy-paste prompts I’ll be sharing.

But the main goal right now is to build together. Building alone is the fastest way to quit.

If you want to join and build your own AI SaaS with us: drop a comment or shoot me a DM, and I’ll send you the invite!

reddit.com

u/Wide-Tap-8886 — 10 hours ago

▲ 341 r/AIAgentsInAction

u/Delicious_Try8468 — 10 days ago

▲ 9 r/AIAgentsInAction

Everyone says they have AI agents in production. Nobody can clearly answer "how do you know it's actually working" Can you?

Once an agent is live, the next question gets surprisingly hard to answer.

How do you know it is actually working?

Not in a demo. Not on a benchmark. In production.

We have spent a lot of time looking at agent systems across support bots, internal copilots, RAG workflows, and multi-step setups.

The surprising part is that the model is usually not the main problem. The harder part is defining what “working” means, then measuring it in a way that survives real usage.

A few patterns keep coming up.

An “autonomous research agent” gets judged by thumbs-up rate, but nobody can clearly describe what the bad 20% actually looks like.

A multi-agent workflow fails, but the team cannot tell whether the issue came from retrieval, routing, tool use, or state passed between steps.

An eval set looks strong in staging, but nobody is measuring production outputs closely enough to know whether that behavior holds up under real traffic.

A team says the agent does the job well, but they have never run it enough times across varied inputs to know where consistency starts to break.

That has changed how we think about production agents.

The fix is usually not “switch to a better model.” More often it is one of a few less glamorous things.

Write down what success looks like in a form that can actually be graded.

Trace each step, not just the final output.

Run broader scenario coverage before production sees the edge cases first.

Take failures from production and push them back into the eval set so the system does not keep passing the same stale checks.

That last one feels especially important.

A lot of eval sets get written once, then stay mostly frozen while prompts, models, tools, and workflows keep changing underneath them.

But the issue is that many teams still talk about agents like they are features, when in practice they behave more like systems.

They have state, dependencies, failure modes, and weird interactions between parts. If that system is non-deterministic, then the job is not to pretend it is deterministic. The job is to make the behavior visible enough to debug, score, and improve.

That is where evals and observability start to matter.

Not as reporting layers, but as the thing that makes non-deterministic behavior legible.

We are curious how this looks for others shipping real agents.

What was the first thing that broke once your agent hit real users?

reddit.com

u/Future_AGI — 1 day ago

▲ 74 r/AIAgentsInAction+1 crossposts

Garry Tan Runs 100,000 Pages of Brain, 100+ Skills, and One Thin Harness

Garry Tan's personal AI isn't a chatbot. It's a runtime with 100+ skills, 100,000 pages of structured knowledge, and a meta-skill that writes new skills automatically.

Here's how it works:

The harness

OpenClaw receives input, matches it to a skill, dispatches
Thin by design. A few thousand lines of routing logic, nothing else
The harness knows nothing about books, meetings, or people. It just routes
Hosting options: spare computer at home via Tailscale, or Render/Railway in the cloud
Alternative harness: Hermes Agent. Garry runs both simultaneously
Pi is a third option if you want to build your own harness from scratch
After every new skill is created, run check_resolvable to verify it's wired into the resolver

The skills

100+ markdown files, each handling one specific task
book-mirror: extracts all chapters, runs a sub-agent per chapter, maps each one to Garry's actual life context in two columns (what the author argues / how it maps to his actual life)
meeting-ingestion: pulls transcript, structures a summary, then walks every person and company mentioned and updates their brain pages with what was discussed. The entity propagation is the real output, not the meeting summary
enrich: pulls from 5 sources, merges into one cited brain page with career arc, contact info, meeting history, and relationship context
perplexity-research: runs web search via Perplexity, but checks the brain first before synthesizing, so output explicitly flags what's new versus already captured
cross-modal-eval: sends output through multiple models, has them score each other on quality dimensions you define. This is how factual errors in book-mirror got caught and baked back into the skill
media-ingest: handles video, audio, PDF, screenshots, GitHub repos. Transcribes, extracts entities, files to the correct brain location
skillify: the meta-skill. Run /skillify on any completed workflow and it examines what happened, extracts the repeatable pattern, writes a tested skill file with triggers and edge cases, and registers it in the resolver automatically
Skills compose: book-mirror calls brain-ops for storage, enrich for context, cross-modal-eval for quality checks, and pdf-generation for output in sequence
When you improve one skill, every workflow that calls it gets better automatically

The brain

100,000 pages of structured knowledge in a git repo
Every person gets a page: timeline, current state, open threads, relationship score
Every meeting triggers entity propagation: after every call, the system walks through every person and company mentioned and updates their pages with what was discussed
Every book gets a chapter-by-chapter mirror
Every article, podcast, and video gets ingested, tagged, and cross-referenced
Page schema: compiled truth at top (current best understanding), append-only timeline below (events in chronological order), raw source material as data sidecars
Per-section brain searches on every right-column entry: when the book talks about difficult conversations, it pulls from actual meeting notes with specific people, not generic synthesis
97.6% recall on LongMemEval, beating MemPalace with no large language model in the retrieval loop
Deep retrieval uses GBrain tool use: every cited entry in a mirror links back to an actual brain page
Think personal Wikipedia where every page is continuously updated by an AI that was at the meeting, read the email, watched the talk, and ingested the PDF

The models

Opus 4.7 1M for precision tasks and catching factual errors
GPT-5.5 for exhaustive extraction, recall, and missing context
DeepSeek V4-Pro for creative passes, third perspectives, and catching when output reads as generic
Groq with Llama for speed
The skill decides which model runs for which task. The harness doesn't care
Cross-modal evaluation runs multiple models against the same output and has them score each other. That's how book-mirror Version 1 caught three factual errors before Version 2 shipped
Treating any single model as the answer is the wrong frame. The model is the engine, everything else is the car

The resolver

The resolver is the routing table for intelligence
It maps incoming requests to the right skill
Every new skill registered with check_resolvable gets wired in automatically
The harness dispatches based on resolver output. Nothing is hardcoded

Real examples

Demis Hassabis fireside prep: accumulated brain page built over months from articles, podcast transcripts, and meeting notes. Published beliefs on artificial general intelligence timelines ("50% scaling, 50% innovation," 5 to 10 years out). Mallaby biography highlights. Stated research priorities: continual learning, world models, long-term memory. Cross-references to Garry's own public positions. Three live demo scripts for showing multi-hop reasoning during the conversation. All ready in under 2 minutes
Book mirror on Pema Chödrön's When Things Fall Apart: 22 chapters, each processed by a sub-agent. Output was a 30,000-word document mapping every chapter to Garry's actual life: family history, founder conversations that week, patterns from therapy, late-night writing sessions. Took 40 minutes. A $300/hour therapist with full context couldn't do it in 40 hours
Book mirror Version 1 had three factual errors: wrong marital status, wrong birth country. cross-modal-eval caught them. The fix got baked into the skill. Every mirror since has been clean
Garry has run this on 20+ books. Each mirror knows about every mirror before it. The context compounds

The open source stack

GBrain: knowledge infrastructure, 97.6% recall on LongMemEval, ships with 39 installable skills including everything above, one command to install
OpenClaw: primary harness
Hermes Agent: alternative harness, Garry runs both
GStack: coding skill framework, used as a skill inside OpenClaw when the agent needs to write code, includes a programmable browser both headed and headless
Full data repos on GitHub
Start with GBrain, do one real task with your agent, run /skillify on it, run check_resolvable, repeat
Here's the Github

u/Best_Volume_3126 — 4 days ago

▲ 0 r/AIAgentsInAction

Why I Stopped Using Markdown for Claude Code Outputs. HTML Outputs Are Underrated

Markdown made sense when you were the one editing the file. You'd write a plan, Claude would suggest changes, you'd merge them by hand. The format served that loop.

That loop is mostly gone. Claude edits the files now. You read them, or you pass them to a verification agent, or you share them with someone who needs to approve the direction. Nobody's doing line edits in a text file.

Markdown still works at 30 lines. Past 100, most people stop reading. The format can't hold tables with real styling, can't embed diagrams that don't look like ASCII guesswork, can't let you interact with the content. It just sits there as text.

HTML doesn't have those limits. Claude can put real tabular data in a table with CSS, draw diagrams in SVG, add JavaScript-driven sliders so you can tune a parameter and see the result, build a mobile-responsive layout if the file needs to travel. There's almost no category of information that won't fit, and you can share it as a URL instead of an attachment.

The practical upgrade shows up fast in a few specific workflows.

Specs and planning. Instead of a 200-line markdown plan nobody reads, ask Claude Code to produce an HTML file with mockups, data flow diagrams, and annotated code snippets in one document. Pass that file into the next session as context. The verification agent reads it too and has far more to work with than a flat text spec.

A prompt that works:

Code review. Rendered diffs, severity-coded annotations, flowcharts of the logic you're trying to explain, all in one file you attach to the pull request. The default GitHub diff view doesn't do any of that.

Throwaway editors. This one takes a minute to see, but it's the most useful pattern. When you're working on something that's painful to describe in text, ask Claude to build you a single-purpose HTML interface for that exact thing. Drag-and-drop ticket triage, a form-based config editor with dependency warnings, a side-by-side prompt editor with live variable rendering. Always end it with an export button that outputs the result as text or JSON you can paste back into Claude Code.

The export button is the critical detail. Without it, the editor is a dead end. With it, it becomes a UI layer for your agent loop.

Context is the reason to use Claude Code specifically for this. Claude Code can read your file system, pull from connected Model Context Protocol servers like Slack or Linear, check your git history. An HTML report built from that context will have actual specifics in it, not placeholders.

One real tradeoff: HTML diffs are noisy. If your team reviews documentation in version control, HTML is harder to scan than markdown. For files that live in a repo and get reviewed in pull requests, markdown still wins. For files that get read, shared, or acted on, HTML is the better format.

The frontend design plugin helps Claude produce cleaner, more consistent HTML output. If you want it to match your product's visual style, point Claude at your codebase and ask it to generate a design system reference file first, then use that as a reference for subsequent HTML outputs.

You don't need a skill or a preset for any of this. Ask Claude to make an HTML file and describe what it should contain. The format will handle the rest.

reddit.com

u/Best_Volume_3126 — 1 day ago

▲ 34 r/AIAgentsInAction

The Full Claude Ecosystem: 1,200+ MCP Servers, 400+ Plugins, 25+ Agent Frameworks

Don't run Claude in a loop: prompt in, answer out. Here's a full Claude ecosystem published it as six reference files on GitHub, verified April 2026. Commands, Model Context Protocol servers, plugins, tools, workflows, agent frameworks.

Commands worth knowing

/remote-control — Control your local Claude Code session from your phone via claude.ai
/fork — Branch your conversation without touching main context
/usage-report — Full HTML analytics: sessions, token cost by project, most-used commands
/checkpoint — Save conversation state before a major change
/memory-dump — Export everything Claude knows about your project to a file
/diff-review — Claude reviews the full git diff and annotates every change
/security-scan — Runs a vulnerability check on current codebase

Community-discovered activation phrases, not in official docs, consistent across sessions:

MEGAPROMPT      → Claude expands your rough idea into a full spec before executing
BEASTMODE       → Full effort, no shortcuts, maximum output
ULTRATHINK      → Extended reasoning before any response
STEELMAN        → Claude argues the strongest version of your idea first
CRITIC MODE     → Claude finds every flaw before proceeding
FIRSTPRINCIPLES → Breaks the problem to fundamentals before solving

Install Memory MCP first

Every session starts from zero without it.

{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-memory"]
    }
  }
}

Session startup: "Load project context for [name]. Retrieve: architecture decisions, coding standards, current sprint tasks."

Other high-impact Model Context Protocol servers: Filesystem, GitHub, PostgreSQL, Brave Search, Puppeteer (controls a real Chrome instance), Fetch. The repo also documents 10 memory systems including Memora, which runs fully local with no cloud dependency.

Three plugins

claude skills add juliusbrussee/caveman
/plugin install superpowers@superpowers-marketplace
/plugin install context7@claude-plugins-official

Caveman (27,900+ stars) cuts output tokens 65-75% with no accuracy loss. Superpowers (121,000+ stars) forces plan-before-build, test-before-ship. Context7 (53,864+ stars) pulls live version-specific docs before generation, eliminating hallucinated APIs.

Tool decisions

Retrieval-augmented generation app?  → LlamaIndex
Everything else?                     → LangChain
Production memory?                   → Qdrant
Local dev?                           → Chroma (pip install, zero setup)
Full backend?                        → Supabase
Local embeddings, no API cost?       → Ollama


curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

Ollama for embeddings and simple tasks, Claude for reasoning that needs it. API costs drop, nothing leaves your machine.

Builder-Validator

Two Claude calls, no framework, opposing objectives.

builder = claude.complete(
    system="Senior developer. Write the best implementation.",
    prompt=task
)
validator = claude.complete(
    system="Security auditor. Find every bug, edge case, vulnerability.",
    prompt=f"Review:\n{builder.output}"
)
# Loop until validator approves

The two roles have structurally incompatible incentives. That tension does the work a single-pass prompt can't. Production numbers: Fountain cut delivery time 50%. Rakuten dropped feature cycles from 24 days to 5. Ramp cut incident investigation time 80%.

Agent framework benchmarks

LangGraph 87% task success. Used at Klarna, Replit, Uber, LinkedIn.
CrewAI 82% task success. Fastest to a working demo. 44,600+ stars, 60 million executions per month.
AutoGen top GAIA benchmark score across all difficulty levels.
Claude Agent SDK Claude-only stacks, no framework overhead.

Fastest to demo?             → CrewAI
Complex production workflow? → LangGraph
Research / code-executing?   → AutoGen
Claude-only stack?           → Claude Agent SDK
Data-heavy retrieval?        → LlamaIndex Agents

95% of agentic tasks don't need multi-agent systems. A well-prompted single Claude instance with 3 tools outperforms a complex 5-agent setup. Build simple first. Full repo

reddit.com

u/Deep_Structure2023 — 1 day ago

▲ 150 r/AIAgentsInAction+1 crossposts

I run a knowledge base that organises itself. No database. No Obsidian plugins. Just markdown files and an agent that maintains the wiki for me.

The whole system is three folders and one config file.

The structure

Create a project folder. Inside it:

raw/      # source material. articles, notes, transcripts, screenshots
wiki/     # AI-maintained. summaries, topic pages, cross-links
outputs/  # generated reports, answers, research

raw/ is the junk drawer. Anything I save goes in unmodified. I don't rename, tag, or sort. That is the agent's job.

wiki/ is written entirely by the agent. I never edit those files by hand. If I need to correct something, I update raw/ and let the wiki regenerate.

outputs/ holds anything I asked the system to produce. Briefings, comparisons, gap analyses.

The schema file

The single file that makes this work is CLAUDE.md in the project root. Without it, the agent guesses at structure and the wiki drifts. With it, every regeneration produces the same shape.

# Knowledge Base Schema

## What This Is
A personal knowledge base about [YOUR TOPIC].

## How It's Organized
- raw/ contains unprocessed source material. Never modify these files.
- wiki/ contains the organized wiki. AI maintains this entirely.
- outputs/ contains generated reports, answers, and analyses.

## Wiki Rules
- Every topic gets its own .md file in wiki/
- Every wiki file starts with a one-paragraph summary
- Link related topics using [[topic-name]] format
- Maintain an INDEX.md that lists every topic
- When new raw sources are added, update the relevant wiki articles

## My Interests
[List 3-5 things you want this knowledge base to focus on]

The wiki rules section is the part that matters. It locks the structure. INDEX.md every time. Topic pages with summaries up top. Double-bracket links between concepts.

Filling raw/

Most people stall here, staring at an empty folder. The fix is to dump everything in one session. Clipped articles as .md files. Exported notes. Screenshots. Meeting transcripts. Research PDFs.

I run about 15 raw source files across my content pipeline at any given time. None of them are sorted by hand.

Building the wiki

Open Claude Code in the project root and run this prompt:

Read everything in raw/. Compile a wiki in wiki/ following the rules in 
CLAUDE.md. Create an INDEX.md first, then one .md file per major 
topic. Link related topics. Summarize every source.

Then leave it alone. When it finishes, the wiki folder has topic pages with cross-links, summaries of sources I forgot I saved, and an index that makes the whole thing searchable.

The compounding loop

The system becomes useful once the wiki crosses 10 articles. Then I start querying it:

Based on everything in wiki/, what are the three biggest gaps in my 
understanding of [topic]?


Compare what source A says about [concept] vs source B. Where do they 
disagree?


Write me a 500-word briefing on [topic] using only what's in this 
knowledge base.

Outputs go back into outputs/, and the agent picks them up on the next wiki regeneration. Each query makes the next one sharper.

Monthly health check

Once a month, this prompt runs against the project:

Review the entire wiki/ directory. Flag contradictions between articles. 
Find topics mentioned but never explained. List claims not backed by a 
source in raw/. Suggest 3 new articles to fill gaps.

This is the quality control step. If the agent writes something slightly wrong and the next pass builds on it, errors compound fast. The health check catches drift before it becomes structural.

u/Forward_Regular3768 — 11 days ago

▲ 39 r/AIAgentsInAction

10 Claude Code plugins worth installing if you build iOS apps.

Caveman (JuliusBrussee/caveman) cuts Claude's output tokens by 65-75% by stripping filler responses. Keeps technical precision, kills pleasantries. Useful when long stack traces and large Swift files eat context fast. Bonus: caveman-compress rewrites your CLAUDE.md to ~46% fewer tokens.
Superpowers (obra/superpowers) imposes engineering discipline before Claude writes anything. Forces clarifying questions first, breaks work into 2-5 minute tasks with exact file paths, enforces test-driven development red/green/refactor, runs verification commands before marking work done.
SuperClaude Framework (SuperClaude-Org/SuperClaude_Framework) — adds 30 slash commands and 20 specialized personas on top of Claude Code. The ones that matter for iOS: architect (module boundaries), security engineer (keychain, certificate pinning), performance (memory leaks, main-thread violations). Built-in 70% token reduction for large codebases.
TDD Guard (nizos/tdd-guard) hooks into Claude Code's file operations and blocks implementation code without a failing test first. Supports XCTest. Stops the common pattern where Claude writes tests that confirm its own implementation rather than testing the contract. 2,000+ GitHub stars.
Safety Net (kenryu42/claude-code-safety-net) intercepts destructive git commands before execution. Blocks git reset --hard, force pushes to main/master, git branch -D, and rm -rf on project directories. Redirects Claude toward safer alternatives instead.
Cartographer (kingbootoshi/cartographer) deploys parallel subagents to map your codebase and output an architecture.md: module dependency graph, data flow, layer separation analysis. Feed the output into your CLAUDE.md so every session starts with full architectural context.
Karpathy Guidelines (forrestchang/andrej-karpathy-skills) encodes Andrej Karpathy's published observations on large language model coding behavior as enforced rules. No unrequested code, no premature abstractions, no protocol extensions added "just in case." Prefer reading before modifying. Delete dead code rather than commenting it out.
Context Engineering Kit (hesreallyhim/awesome-claude-code) patterns for working with codebases that exceed a single context window. Hierarchical loading (architecture first, then specific modules), context handoff templates for multi-session work, minimal-footprint CLAUDE.md structure. Pair with Caveman: Caveman compresses outputs, this compresses inputs.
Trail of Bits Security Skills (trailofbits/) the same security auditing methodology Trail of Bits uses on paid client engagements, published as Claude Code skills. Covers keychain misuse, certificate pinning gaps, hardcoded credentials, insecure UserDefaults usage, URL handling injection, and incorrect NSPrivacyAccessedAPITypes. Run before App Store submission.
Claude Code Workflows (OneRedOak/claude-code-workflows) structured templates for code review, security assessment, and pre-pull request checklists. Customizable for iOS-specific checks: no synchronous main thread operations, closure capture lists, forced unwraps, accessibility identifiers on new interactive elements.

Install order if starting from zero: Caveman + Superpowers first. Safety Net before you need it. TDD Guard if test coverage matters. Everything else as the project grows.

reddit.com

u/Deep_Structure2023 — 3 days ago

▲ 5 r/AIAgentsInAction

Obsidian users: what does it do for you?

People around me swear by Obsidian and I’m honestly not sure I get it. I’m using Claude Code a lot, and I’m trying to understand how do you actually use Obsidian together with Claude? tnx in advance!

reddit.com

u/Ok_Bee_7292 — 3 days ago

▲ 22 r/AIAgentsInAction

My nerdy younger brother makes more $ than me now!

He's kinda a nerd yeah. I was teaching him some basics in AI since he turned 16 but fast forward he found out he can make money off this. I replied "Im sure you do", jokingly because I didn't think he actually is able to find some clients. Hes now 17 and thanks to an automation he built on n8n, he gets at least 2 clients a month and builds them a new website or makes them a chatbot. On April he made 1000$ which is actually more than me that month. what can I say haha

This automation finds businesses with low reviews on Google Maps and actually sends them personalised emails. That way hes able to find a car rental company that someone rated low, because of the trashy, non helpful website. Now obviously I use that as well but it's good to have a nerd lil bro hahah

Wanted to share this as a fun motivation to any young builders there who struggles to get that first win.

reddit.com

u/RubPotential8963 — 6 days ago