u/Illustrious-Bug-5593

I left an AI loop running overnight. Woke up to 20 shipped agents.

So last month Karpathy dropped autoresearch. Autonomous loop, runs experiments overnight, keeps what works, throws away what doesn't. I watched it blow up and thought, this pattern is sick. But I don't do ML. What I do have is a problem that's been eating at me: finding good ideas to build.

In 2026, finding a problem worth solving is harder than actually solving it. Every obvious pain point has 12 SaaS tools already fighting over it. The interesting stuff is buried in Reddit threads at 2am where someone rants about something nobody's built for. I used to scroll those manually. Now I don't.

I took that same loop and pointed it somewhere else. My system scrapes Reddit, HN, GitHub, and Twitter for real problems. Scores them on demand, market gap, feasibility. If something clears the threshold it builds a standalone AI agent, validates it works, and commits it. The threshold ratchets up every build so the ideas have to keep getting better.

Here's the part that surprised me. The system rejected over 80 ideas before shipping 20. Resume ATS optimizer? GAP: 0, there's already 10+ free tools. Salary negotiation advisor? GAP: 0. Insurance policy analyzer? GAP: 0. Food ingredient scanner? Yuka has 8M users. The research log reads like a graveyard of "obvious" ideas that are already solved. But then it found wage theft affects 82M workers and there's no free tool that combines FLSA exemption analysis with state specific overtime calculation. Built wage-rights-advisor. Found that only 5% of homeowners appeal their property tax but 30 to 94% who do succeed. Built property-tax-appeal-advisor. Found that 70M Americans get contacted by debt collectors annually and every AI tool in that space serves the collectors, zero serve consumers. Built debt-collection-rights-advisor.

Now let me be real. How do you verify the quality? Not fully automated. The system boots each agent, sends a test prompt, checks if the output is useful. But these are MVPs. Some are rough. The research log with all the scored and rejected ideas, that's almost more valuable than the agents themselves. I wake up, look at what shipped, look at what got rejected and why, and pick the most promising direction. It's an idea machine that also writes the first draft of the code. When every obvious idea feels taken, the 80+ rejected ideas with documented reasoning for why they failed is honestly the best part.

Three files. program.md tells Claude Code where to research and what bar to hit. seed/ is a minimal Next.js template with 7 tools. run.sh launches Claude Code headless and auto restarts on context limits. No LangChain, no CrewAI. TypeScript, MIT, runs on OpenRouter or Ollama. Each agent is standalone, clone and run.

reddit.com
u/Illustrious-Bug-5593 — 10 hours ago
▲ 5 r/SideProject+5 crossposts

I left an AI loop running overnight. Woke up to 20 shipped agents.

Repo: https://github.com/Dominien/agent-factory

So last month Karpathy dropped autoresearch. Autonomous loop, runs experiments overnight, keeps what works, throws away what doesn't. I watched it blow up and thought, this pattern is sick. But I don't do ML. I don't have training runs to optimize. What I do have is a problem: finding good ideas to build.

In 2026, finding a problem worth solving is harder than actually solving it. Every obvious pain point has 12 SaaS tools already fighting over it. The interesting stuff is buried in Reddit threads at 2am where someone rants about something nobody's built for. I used to scroll those manually. Now I don't.

I took that same loop pattern and pointed it somewhere else. My system scrapes Reddit, HN, GitHub, and Twitter for real problems. Scores them on demand, market gap, feasibility. If something clears the threshold it builds a standalone AI agent, validates it works, and commits it. The threshold ratchets up every build so the ideas have to keep getting better. Leave it running overnight, wake up to new agents.

First session it found freelancers asking about missed tax deductions, built freelancer-deduction-finder. Found people confused about overtime exemptions, built wage-rights-advisor. Found people overwhelmed by data broker opt outs, built data-broker-opt-out. 20 agents shipped so far.

Now let me be real because I know someone's going to ask. How do you verify quality? Honest answer: not fully automated. The system boots each agent, sends a test prompt, checks if the output is useful. But these are MVPs. Some are rough. The point was never "autonomous startup factory." I wake up, look at what shipped, pick the most promising one, and that becomes my next real project. It's an idea machine that also writes the first draft. When every obvious idea feels taken, that's the part that matters.

Three files. program.md tells Claude Code where to research and what bar to hit. seed/ is a minimal Next.js template with 7 tools. run.sh launches Claude Code headless and auto restarts on context limits. No LangChain, no CrewAI. TypeScript, MIT, runs on OpenRouter or Ollama. Each agent is standalone, clone and run.

u/Illustrious-Bug-5593 — 12 hours ago