r/WebAfterAI

DeerFlow by ByteDance: The Open-Source SuperAgent Harness That Actually Runs Long-Horizon Tasks (Multi-Agent, Sandboxes, Skills & Real Workflows)

DeerFlow by ByteDance: The Open-Source SuperAgent Harness That Actually Runs Long-Horizon Tasks (Multi-Agent, Sandboxes, Skills & Real Workflows)

DeerFlow (Deep Exploration and Efficient Research Flow) is an open-source SuperAgent harness from ByteDance, the company behind TikTok. It orchestrates long-horizon tasks (minutes to hours) that go far beyond simple chat or one-shot queries.

Version 2.0 (released around late February 2026) quickly hit #1 on GitHub Trending and has amassed tens of thousands of stars(66.8K Stars). It evolved from an internal deep-research tool into a full execution environment for research, coding, content creation, data pipelines, and more.

What It Does:

DeerFlow is not just another LLM wrapper rather, it's a runtime harness that gives agents real infrastructure:

  • Sub-agents: The main agent decomposes complex tasks and spawns specialized sub-agents that can run in parallel, then report back. This enables teamwork-style orchestration.
  • Extensible Skills: Modular, on-demand skills (loaded progressively to keep context small). Built-in library plus easy custom skills (e.g., deep-search, biotech analysis, frontend deployment). Skills bundle tools, procedures, and knowledge.
  • Sandboxes: Isolated Docker-based execution environments (recommended: All-in-One Sandbox combining browser, shell, file system, MCP, and VSCode Server). Agents can read/write files, run code/bash, install packages, and persist state safely without risking the host. Persistent, mountable FS for long-running tasks.
  • Memory & Context Engineering: Short-term (in-context) + long-term memory (persistent, summarization/offloading to filesystem). Aggressive context management to handle hour-long sessions without token explosion.
  • Tools & Integrations: Web search/crawling (including BytePlus InfoQuest), code execution, file ops, IM channels (e.g., DingTalk), Claude Code/Cursor integration, LangSmith/Langfuse tracing.
  • Message Gateway: Central routing for agent-to-agent communication, reducing chaos in multi-agent setups.
  • Multi-Model Support: Works with OpenAI, DeepSeek, Kimi, Doubao, Gemini, local vLLM/Qwen models, etc. Built on LangChain/LangGraph for flexibility.

Core strength: Long-horizon autonomy. It plans, reasons, executes (with tools/sandboxes), iterates, and delivers complete artifacts, not just text.

Sample Workflows and Plug-in Examples:

DeerFlow shines in real-world, multi-step pipelines. You interact via web UI (localhost:2026 by default), API, or embedded Python client.

1. Deep Research & Reporting (core original use case):

  • Input: "Forecast 2026 AI agent trends" or "Analyze Titanic dataset with visualizations."
  • Process: Searches/crawls sources → sub-agents synthesize → generates formatted report (with citations, charts) → optional export.
  • Plug-in: Use the built-in deep-search skill. Extend with domain-specific skills (e.g., biotech.md).

2. Coding & Development:

  • Input: "Build a simple Pygame physics demo."
  • Process: Plans → writes code in sandbox → installs deps → runs/tests → iterates on output.
  • Integration: Claude Code/Cursor for seamless handoff; sandbox executes safely.

3. Content Creation:

  • Input: "Generate video based on Pride and Prejudice scene" or "Doraemon comic explaining MoE architecture."
  • Process: Research → drafts → uses tools for images/video → assembles deliverable.

4. Data/Workflow Automation:

  • Input: "EDA on dataset X and create slides."
  • Process: Loads data in sandbox → Python scripts → visualizations → outputs deck/PDF.

5. Embedded Use (as Python Library):

  • No full HTTP services needed. Use DeerFlowClient for direct in-process access in your scripts/apps.

Custom Skills/Extensions: Add via skills/ dir or npx skills add .... Skills have SKILL.md for docs. Configurable via config.yaml and extensions_config.example.json.

Community examples include market analysis reports, podcast summaries, slide decks, and full content pipelines (research → draft → publish).

Setup and Usage:

Easiest path (recommended):

  1. git clone https://github.com/bytedance/deer-flow.git && cd deer-flow
  2. make setup (interactive wizard for models, search, sandbox prefs).
  3. Docker: make docker-init && make docker-start (or make up for prod).
  4. Access: http://localhost:2026. github.com

One-line prompt for coding agents: "Help me clone DeerFlow... following Install.md."

Requirements: Docker preferred (for sandbox), Node/pnpm/uv for dev. Sizing: 8+ vCPU/16+ GB RAM for comfort on long tasks.

Security Note: Sandbox isolates execution, but improper public deployment risks exposure. Use auth, limit CORS, etc.

Limitations/Considerations: Needs strong reasoning models for best results on complex tasks; multi-model VRAM management for local runs; still evolving (check recent commits for nginx/CORS fixes, etc.).

DeerFlow represents a shift toward practical, executable AI agents rather than chatbots. It's MIT-licensed, self-hostable, and extensible, ideal for developers, researchers, and teams wanting autonomous workflows.

u/ShilpaMitra — 3 days ago

Make the Model Yours: The Ultimate Guide to Fine-Tuning LLMs

If you're done just prompting off-the-shelf models and want to actually own your LLM - make it better at your domain, your style, your task, then fine-tuning is the way. Whether you're on a single 24GB GPU, running serious experiments, or just want a no-code web UI, the ecosystem has matured massively.

Here's my curated list of the absolute best fine-tuning tools right now, going through each one with why it matters and who should use it:

1. LLaMA-Factory (★71.1K): github.com/hiyouga/LLaMA-Factory

The most user-friendly option by far and the 71.1K stars prove it.

  • Fine-tune 100+ different LLMs with zero code
  • Beautiful web UI
  • Supports LoRA, QLoRA, full fine-tuning, and more
  • One-click training, evaluation, merging, and exporting

Perfect for beginners, rapid prototyping, or if you just want to click buttons and get results. It's the "ChatGPT for fine-tuning."

2. Unsloth (★63.9K): github.com/unslothai/unsloth

The speed king. This thing lets you fine-tune Llama, Mistral, Qwen, Gemma (and more) 2x faster with 80% less memory. It's literally the only library you need if you're resource-constrained.

  • Runs comfortably on a single consumer GPU
  • Excellent LoRA/QLoRA support
  • Actively maintained and extremely popular for a reason

If your main bottleneck is VRAM or training time, start here. Most people doing quick personal fine-tunes live in Unsloth.

3. TRL (★18K): github.com/huggingface/trl

The official Hugging Face library for alignment - this is how the big labs turn base models into helpful assistants.

  • RLHF, DPO, PPO, ORPO, KTO - all the modern preference optimization techniques
  • Everything you need to go from SFT → alignment
  • Used to recreate the techniques behind GPT-4, Claude, etc.

If you care about making your model actually follow instructions, refuse harmful requests, or optimize for specific human preferences, TRL is mandatory.

4. Axolotl (11.9K): https://github.com/axolotl-ai-cloud/axolotl

The "serious fine-tuner" toolkit. This is what most experienced people actually use when they want full control.

  • Everything via clean YAML configs
  • Supports literally every dataset format
  • Every training technique you can think of (LoRA, QLoRA, full fine-tune, DPO, etc.)
  • Built as the high-level ops layer on top of Hugging Face Transformers

If you want to run reproducible, production-grade fine-tunes and not fight with code, Axolotl is the answer. Used heavily by researchers and teams releasing high-quality models.

5. Mergekit (★7.1K): github.com/arcee-ai/mergekit

The secret weapon of the open-source model scene.

  • Merge multiple fine-tuned models using Slerp, TIES, DARE, Linear, Passthrough, etc.
  • No GPU required for merging
  • Creates those insane "Frankenstein" models that often beat their individual parents

Almost every popular merged model you see on Hugging Face these days was made (or heavily influenced) by Mergekit. If you're into model soups and frankenmerging, this is essential.

6. Torchtune (★5.9K): github.com/pytorch/torchtuneMeta's official PyTorch-native fine-tuning library.

  • Clean, hackable, well-documented
  • Pure PyTorch — no heavy abstractions
  • Great reference implementation

If you like living in raw PyTorch, want maximum flexibility, or are doing research/experimentation where you need to modify things at a low level, Torchtune is fantastic.

Quick Recommendation Guide:

  • Single GPU / fast & cheap → Unsloth
  • Maximum control & reproducibility → Axolotl
  • Zero code / fastest to results → LLaMA-Factory
  • Alignment / RL → TRL
  • Pure PyTorch / research → Torchtune
  • Creating super models via merging → Mergekit

The beautiful part? Many of these work together. You can fine-tune with Unsloth or LLaMA-Factory, align with TRL, then merge with Mergekit. Let me know your stack below, always looking for new workflows!

u/ShilpaMitra — 5 days ago

Kimi K2.6 Coding Agent Crushed My Weekend Projects – Claude-Level Results at 1/7th the Price

New coding models drop constantly these days, and Kimi K2.6 has been quietly getting tagged as the cheap Claude alternative. But the full Kimi Code agent is no alternative at all. It’s straight-up competitive and in some cases better, all at literally 1/7th the price.

The pricing reality check:

Claude Opus 4.7: $5 / $25 per million input/output tokens
Kimi K2.6: $0.80 / $3.60 per million

Same ballpark on SWE-Bench and Terminal-Bench, but it actually pulls ahead on long multi-hour agentic workflows. That’s not good for the money. That’s just good, period. When you’re burning tokens for hours at a time, the cost difference is massive.

Kimi Code isn’t just chat. It’s a real agent:

You don’t babysit it step-by-step. You give it a goal, point it at your repo, and it plans → executes → debugs → iterates → ships. It runs natively in your terminal/IDE and feels like having a senior dev who never sleeps.

Here are the commands that actually changed how it works:

  • '@SymbolName' – Instant context pull. Type '@AuthService.refresh' '@TokenStore.cleanup' and it traces everything across files without you copy-pasting a single import.
  • /explain – Drop this in a crusty legacy monolith and get a full architecture map, hotspots, and data flows in seconds. Saved me literal days.
  • .kimi/rules – One file in your project root that sets coding style, forbidden patterns, security rules, etc. It loads automatically every session. Team-wide consistency without nagging.
  • Checkpoint prompting – Forces structured status updates every X steps so a 6-hour run doesn’t die and leave you with nothing.
  • /test – Generates real tests + edge cases (nulls, concurrency, overflows) automatically. Then you can do /review to make the tests better.

Real stuff it has done:

  1. Took a Zig inference project on a Mac and optimized it from ~15 tokens/sec to ~193 tokens/sec over 12+ hours and 14 iterations. No hand-holding. Beat LM Studio on the same hardware.
  2. Grabbed an 8-year-old open-source financial matching engine and pushed it way past what the original maintainers ever got: medium throughput +185%, peak +133%. It literally read flame graphs and rewrote the core execution loop.

That’s not autocomplete. That’s engineering at scale.

The iteration loop that makes it scary good:

Never accept the first output. I started using this pattern and the quality jumped:
Run the full test suite after every change. Coverage cannot drop. Response time must stay under 200ms.

Then after it passes: Now make it even better while keeping all the above constraints.
14 loops later you have something that feels hand-crafted by someone who actually cares.

Troubleshooting the inevitable drift (because it still happens sometimes):

- Scope lock at the start of every prompt
- Drop a CONSTRAINTS.md in root for long sessions
- /compact + restate goal when it starts wandering
- Explicitly say “do not rewrite unrelated modules”

Setup is simple (Mac/Linux/Windows all work):

Just kimi login, cd into your project, and start giving it real outcomes instead of questions.

I’m not saying replace your whole stack tomorrow, but if you’re doing any serious coding work and the Claude bill is hurting, this is the one that actually feels like the future right now. Open-source too, so you can self-host and fine-tune later.

reddit.com
u/ShilpaMitra — 1 day ago

Mastering Obsidian Vaults as the Core of Your Agent Harness and AI Workflows – A Practical, Example-Driven Guide

Obsidian isn't just a note-taking app anymore. In 2026, it's become the long-term memory layer, knowledge graph, and orchestration hub for AI agents. Your vault of plain Markdown files serves as a persistent, searchable, versionable context that agents can read from, write to, and reason over, far better than ephemeral chat histories or vector DBs alone.

This post walks through real setups, tools, and workflows so you can start using Obsidian as your agent harness foundation today. Whether you're a solo builder, researcher, or running multi-agent systems, you'll learn something actionable.

Why Obsidian Excels as an Agent Harness Foundation

  • Plain files + links = natural knowledge graph: Agents traverse wikilinks, backlinks, and embeds without custom indexing.
  • Version control ready: Git integration for agent changes with human review.
  • Skills & CLI access: Official tools let agents create/edit Markdown, Bases, Canvas, and more natively.
  • Plugins + local-first: Everything stays private; run local models or hybrid.
  • Compounding memory: Agents update notes, link new insights, and maintain hygiene over time.

Common pain points solved: Stale notes, lost context, manual organization, and agents "forgetting" previous work.

Core Setup: Connecting Agents to Your Vault

  1. Basic Filesystem Access (quick start): Point your agent CLI (Claude Code, Codex, etc.) at the vault folder. Use symlinks for selective access.
  2. Obsidian CLI + Skills:
    • Obsidian's official CLI (v1.12+) exposes search, tasks, tags, plugins, etc.
    • Install kepano/obsidian-skills (by Obsidian CEO): npx skills add kepano/obsidian-skills. This teaches agents Obsidian Flavored Markdown, Bases, JSON Canvas, and CLI commands.
  3. In-Vault Agents:
    • Obsilo Agent (community plugin via BRAT): Autonomous layer with 40-49+ tools, semantic search, persistent memory, multi-agent workflows, plugin-as-skills discovery. Local-first, open-source. Install → enable → it learns your rules/workflows.
    • Agent Client / AI Agent Sidebar plugins: Chat directly in Obsidian with CRUD on files. Supports Claude Code, Gemini, etc.
    • Copilot, Smart Connections, Vault Chat: For semantic search and quick agents.
  4. /init for System Prompts: In Claude Code (or similar), run /init in your vault root to create CLAUDE.md, your constitutional document for all sessions. Include vault conventions, workflows, and AGENTS.md.

Pro Tip: Create a dedicated "Agent" or "Harness" folder with AGENTS.md documenting your skills, templates, and rules. Agents read this first.

Example 1: Personal Knowledge Guardian Agent: Keep your vault clean, linked, and fresh without manual effort.

  • Setup: Dedicated vault or subfolder. Install Obsidian CLI skills + Obsilo or Claude Code in terminal.
  • Workflow:
    1. Capture messy notes daily (Inbox folder).
    2. Trigger agent: "Review today's captures. Standardize frontmatter, add wikilinks based on semantic similarity, create daily note summary, flag stale notes."
    3. The agent uses CLI for search/tasks, skills for proper Markdown/Bases, and writes back.
    4. Git commit + review.

Result: Agents now lint metadata, suggest connections, and maintain Zettelkasten principles.

Sample Prompt in CLAUDE.md or Obsilo:

You are Vault Guardian. Follow my Zettelkasten rules. Use obsidian-markdown skill. Prioritize atomic notes, strong backlinks. Output changes as diff for review.

Example 2: Simple Task Dispatch from Obsidian Notes

Goal: Turn checkboxes and tagged tasks in your notes into actionable work that an agent handles automatically—no complex scripts needed.

Easiest Setup (10-15 minutes):

  1. Install Claude Code (desktop/CLI version).
  2. Open your Obsidian vault in a terminal: cd /path/to/your-vault.
  3. Run /init in Claude Code to create CLAUDE.md at the vault root (this is your permanent instruction file).
  4. Install kepano/obsidian-skills (one command): npx skills add kepano/obsidian-skills This teaches Claude native Obsidian Markdown, search, links, tasks, etc.
  5. (Optional but nice) Install the free Tasks or TaskNotes plugin in Obsidian for better checkbox handling.

Daily Workflow:

  • Write notes normally. Use simple Markdown tasks:- [ ] Research competitor pricing for Project X [[Project-X-Note]] - [ ] Draft email to client about timeline
  • Open Claude Code in your vault folder and say: "Find all unchecked tasks from today's daily note. Prioritize them, pull context from linked notes, and handle the top 2. Update the checkboxes when done."

What Happens:

  • Claude searches your vault using skills/CLI.
  • Reads linked notes for context.
  • Researches (if needed), drafts content, creates new notes with wikilinks.
  • Edits the original note to mark [x] and adds a summary.

Pro Tip for CLAUDE.md :

Task Rules:
- Use - [ ] for open tasks
- Always add [[links]] to related notes
- After completing a task, append a "Done: [summary]" line and check the box
- Prefer atomic actions

This turns your vault into a lightweight task harness immediately.

Example 3: Basic Business/Project OS with One Main Agent (No Multi-Agent Complexity)

Goal: Run research, content, and project tracking entirely from your vault with minimal setup.

Folder Structure (create these folders - numeric prefixes sort them nicely):

00-Inbox/          (quick captures)
10-Projects/       (one folder per active project)
20-Knowledge/      (evergreen notes)
30-Tasks/          (or just use daily notes)
Agents/            (optional: store persona prompts)

Simple Setup:

  1. Same as Example 2: Claude Code + obsidian-skills + CLAUDE.md.
  2. In CLAUDE.md, add your rules once:You are my Project Assistant.
    • Always create new notes in the correct folder with YYYY-MM-DD prefix.
    • Use wikilinks to connect everything.
    • For research: summarize key points, add sources, link to existing knowledge.
    • End every session with a "Next Actions" section.

Daily Example Workflow (one prompt):

  • Drop a voice note or quick capture in Inbox.
  • Tell Claude: "Process Inbox. Research 'AI pricing strategies 2026'. Create a new note in 20-Knowledge with links to my existing pricing notes. Then update my [[Project-Website-Redesign]] with next steps."

What the Agent Does:

  • Reads your vault for related notes.
  • Researches (web + your knowledge).
  • Creates/updates clean Markdown notes with proper frontmatter, tags, and backlinks.
  • You open Obsidian → everything is there, linked, and searchable.

Results: Product managers use this for PRDs, competitive research, and sprint notes. One prompt replaces hours of manual work. Agents maintain the graph over time so context compounds.

Scaling Tip: Start with one agent (Claude Code in your vault). Once comfortable, duplicate the terminal window for a second specialized agent (e.g., “Research Only”). No fancy orchestration needed at first.

Example 4: Learning / Research Vault with Autonomous Agents

  • Agent scans Arxiv/Papers → drafts notes with links to your existing knowledge.
  • Multi-agent: One researches, another critiques/synthesizes, third updates Canvas mindmap.
  • Persistent: Everything stays in vault for future agents/humans.

Tips, Gotchas, and Best Practices

  • Security: Use .obsidianignore, local models where possible, review agent PRs via Git.
  • Performance: Pre-process graph/embeds; skills reduce tokens dramatically (e.g., 12x fewer vs raw browsing).
  • Multi-Vault: One for personal, one for work/agents - sync selectively.
  • Plugins to Stack: Git, Terminal (for in-app Claude), Dataview for dynamic queries, Canvas for workflows.
  • Scaling: Start small (one workflow). Document everything in AGENTS.md so new agents inherit context.
  • Community Resources: Obsilo forum post, kepano/obsidian-skills GitHub, r/ObsidianMD experiments.

Your vault evolves from static notes to a living, agent-native operating system. Agents don't just query - they maintain, execute, and expand your second brain.

TL;DR: Obsidian vault + CLI/skills + agents (Claude Code/Obsilo/etc.) = persistent memory + executable workflows. Start with skills install and /init today. Your future self (and agents) will thank you.

Want more of this?
I’m launching a weekly newsletter next week with deeper AI agent workflows, templates, new tool discoveries, and experiments. If you found this post useful, you might enjoy it. No pressure at all - only subscribe if you want more: https://tally.so/r/eqK0xJ

u/ShilpaMitra — 3 days ago

Claude for Legal Isn't Just for Lawyers: Everyday People Can Use These Free Open-Source Plugins Too (Setup Guide + Comparison to Other Legal AIs + Real Use Cases)

The Claude for Legal suite is not locked behind any law license or professional credential. Anyone with a paid Claude subscription (Pro at roughly $20/month, Max, Team, or Enterprise) can install the open-source plugins through the free Claude Cowork desktop app on macOS or Windows. No coding is required, and the full setup takes under 60 seconds.

It was built primarily for lawyers, in-house teams, and law students/clinics, but the tools work great for non-lawyers too. The repo explicitly supports personal use, and skills are designed as structured workflows anyone can trigger with simple slash commands.

What Claude for Legal Actually Is:

It's a free, open-source suite of 12 practice-area plugins (plus agents and 20+ connectors) that turn Claude into a specialized legal assistant. It handles:

- Contract reviews with redlines and risk flags
- NDA triage
- Claim tables for disputes
- Deadline/renewal monitoring
- Drafting responses
- Compliance checks
- And more

Everything runs inside Claude Cowork or Claude Code (or your own API). It connects to tools like DocuSign, Slack, Google Drive, Box, Ironclad, etc., via MCP (no extra cost for the plugins themselves).

How Claude for Legal Compares to Other Legal AIs:

Claude for Legal stands out in a crowded field dominated by expensive enterprise tools. Here's a clear head-to-head:

Tool Pricing (per user/mo) Target Users Key Strengths Weaknesses vs. Claude Best For
Claude for Legal $20 (Pro) + free plugins Individuals, solos, in-house, students, non-lawyers Open-source, ultra-customizable playbooks, fast contract/NDA triage, long-context analysis, MCP integrations Relies on general model (add connectors for research databases) Everyday contracts, personal/small-biz use, budget users
Harvey AI $1,000–$2,400+ BigLaw & large enterprises Deep enterprise workflows, firm-wide rollout, strong diligence Very expensive, not for individuals High-volume BigLaw research & ops
CoCounsel (Thomson Reuters) ~$1,600 (or bundled) Enterprise, Westlaw users Authoritative legal research databases, strong litigation support Enterprise-only pricing & setup Research-heavy litigation
Lexis+ AI $200–$400+ Large firms & in-house Primary law research & citations Costly, less flexible for routine tasks Deep precedent searching
Spellbook / Ironclad Varies (often $100–300+) Contract-heavy practices Word integration, clause extraction Narrower scope, less customizable Specific contract management
  • Bonus: You can even add a CoCounsel connector directly into Claude for the best of both worlds (research + workflows).

Practical Use Cases for Non-Lawyers / Everyday People:

You don't need to be a lawyer to benefit. Here are real-world examples anyone can use:

  1. Reviewing personal or small-business contracts before signing
    • Upload your rental lease, employment offer, vendor MSA, SaaS agreement, or freelance contract.
    • Trigger /commercial-legal:review or /privacy-legal:use-case-triage.
    • Get: plain-English summary, redline changes, risk flags (e.g., "unfair indemnity clause"), and deviation matrix in Excel/Word. Real example: Freelancers use the NDA triage skill to quickly spot one-sided terms before signing with a client.
  2. NDA triage (super common for anyone dealing with startups, investors, or partners)
    • /commercial-legal:review or the dedicated NDA skill flags red flags in seconds against standard playbooks.
  3. Drafting or responding to simple legal notices
    • Dispute with a company? Need a DSAR (data access request)? The privacy-legal plugin can draft a professional response within legal timelines.
  4. Monitoring personal deadlines/renewals
    • Scheduled agents watch your contract folder and alert you about expirations (e.g., gym membership, software subs, leases).
  5. Law students or self-learners
    • Dedicated law-student plugin for Socratic drills, case briefing (IRAC), bar prep questions, flashcards, and study planning.
  6. Small business / side-hustle compliance
    • Product launch reviews, privacy policy checks, AI tool governance (if you're using AI in your biz), or basic IP clearance.

Solo devs reviewing client contracts, individuals checking leases, and HR folks in small companies triaging offers. It democratizes access to structured legal workflows that used to cost hundreds in lawyer time.

How to Set It Up:

Option 1: Easiest - Claude Cowork (Desktop App)

  1. Download & install the Claude Desktop app
  2. Sign in with your paid Claude account (free tier won't work).
  3. Open the app → switch to the Cowork tab at the top.
  4. Click the + or Plugins in the sidebar → browse/add the "Legal" plugin (or specific ones like commercial-legal).
  5. (Optional) Point it at a folder on your computer where you keep contracts/docs.
  6. Run the cold-start interview (/commercial-legal:cold-start-interview or whichever plugin you picked) - this customizes it to your playbook in 2–15 minutes.
  7. Start using slash commands like /commercial-legal:review , just attach your PDF/Word file.

Option 2: Claude Code (if you're more technical)

Same process but in terminal, plus drag-and-drop the GitHub repo folder.

Full quickstart (with video) is here: github.com/anthropics/claude-for-legal/blob/main/QUICKSTART.md.
Main repo: github.com/anthropics/claude-for-legal.

Pro tip: Install user-scoped (not project-scoped) so it can read files from anywhere on your computer. Restart the app after installing.

Bottom Line

Claude for Legal isn't trying to replace lawyers, it's making legal tools accessible to the rest of us for routine stuff. Lawyers get superpowers for billable work; the rest of us get a free(ish) paralegal in our pocket for contracts we sign every day.

u/ShilpaMitra — 14 hours ago

Peter Steinberger, the guy behind PSPDFKit (which powers PDF features on a billion+ devices) and the viral open-source AI agent framework OpenClaw, is at it again. He dropped a whole ecosystem of CLI tools built lightning-fast with OpenAI's Codex, giving his local AI agents powerful, practical integrations across communication, media, archives, and more.

This isn't just random scripts. These are polished, local-first .sh tools designed as an orchestration layer for agents. They turn messy APIs, apps, and services into simple, scriptable CLIs that agents can reliably use without constant babysitting.

The new tools:

  • sonoscli.sh - Full Sonos control from terminal: discover speakers, play/pause, group rooms, manage queues, open Spotify links (no extra creds needed), save scenes, and watch live events. Built with Go for reliability on the local network (UPnP/SOAP). Perfect for automations or agents blasting music.
  • wacli.sh - WhatsApp CLI (on whatsmeow). Local sync of message history, fast offline search, send messages/files/replies, contact/group management. Great for archiving personal or team chats.
  • birdclaw.sh - Local-first X/Twitter archive + workspace. Imports your archive (or syncs live), stores everything in SQLite (tweets, DMs, likes, bookmarks, mentions, graph). Full-text search, AI-ranked inbox for triage, reply from CLI, Git backups. Web UI too.
  • gitcrawl.sh - GitHub archive/crawler for agents (helps avoid rate limits when multiple agents are querying repos/PRs/issues).
  • discrawl.sh - Discord mirror into local SQLite. Search and query server history offline without relying on Discord's search.
  • spogo.sh - Spotify integration.
  • imsg.sh - iMessage wrapper.
  • mcporter.sh (MCP-to-CLI) - Bridges Model Context Protocol (or similar) to standard CLI for better agent tooling.
  • sag.sh - ElevenLabs voice integration.
  • askoracle.sh (Second opinion feature) - likely for cross-checking agent outputs or decisions.

Why this matters for AI agents:

OpenClaw is all about local, autonomous agents that run on your machine, interact via familiar apps (WhatsApp, Discord, etc.), and respect your data/privacy. These CLIs provide real local handles.
Agents can now deeply integrate with your personal ecosystem: archive comms for memory/context, control media, search history offline via SQLite + Git, etc. Many use SQLite backends for fast, local querying.

This drop shows the power of AI-assisted shipping and why CLI wrappers are underrated for agentic workflows.
Many of these have GitHub repos under steipete/openclaw and brew installs for easy setup.

reddit.com
u/ShilpaMitra — 8 days ago

OpenAI quietly shipped a game-changer in Codex CLI v0.128.0: the /goal command. This turns Codex into a persistent, self-driving coding agent that keeps looping —plan → code → test → review → iterate —until your objective is verifiably done (or you hit your token budget). No more babysitting every step, no constant “should I run this?” prompts. You give it a high-level goal, and it treats it like a database row it’s determined to flip to “status = done.”

Quick Setup:

  1. Update to the latest:

​

npm install -g u/openai/codex@latest
  1. Enable the experimental feature: codex features enable goals (or manually add goals = true under [features] in ~/.codex/config.toml and restart)
  2. Fire it up in your repo: /goal ship the 18 features listed in BACKLOG.md or whatever your objective is.

It works in CLI sessions even if it’s not showing in the UI yet, and reports say it carries over nicely into the Codex desktop app too.

What it actually does:

  • Persistent “Ralph-style” loop: The agent injects smart continuation prompts automatically. It decomposes the goal into a checklist, inspects files/tests, runs commands, makes edits, self-reviews, and only marks the goal as achieved after a proper audit.
  • Sub-commands for control:
    • /goal pause – suspends everything cleanly
    • /goal resume – picks right back up
    • /goal clear – wipes the current goal
  • Goals are persisted across sessions via the app-server APIs and model tools.
  • You can walk away for hours (people are reporting 18+ hour runs while they sleep/eat). One dev came back to 14/18 features fully implemented, CI green, PRs opened and self-reviewed by sub-agents. Cost? ~$4.20 total.

It shines on exactly the stuff we’ve been dreaming about: turning Figma designs into working mobile apps, full feature implementations from a backlog, complex refactors, bug hunts across the codebase, etc. Codex already had strong context and tool use; /goal just gives it the long-horizon persistence it needed.

Pro tips:

  • Be specific and verifiable in your goal statement. Vague goals = higher chance of false “achieved.”
  • Set a sensible token budget in your config so it doesn’t quietly drain your credits.
  • Pair it with good AGENTS.md / Skills for your team’s style guide.
  • It stops gracefully on terminal close or Ctrl-C; just resume later.

This feels like the first coding agent that genuinely doesn’t need you hovering over it. Other tools (Claude Code, Cursor, Aider, etc.) still tend to stall or ping for permission eventually.

u/ShilpaMitra — 6 days ago
▲ 27 r/WebAfterAI+1 crossposts

Google Chrome Engineer Addy Osmani's Agent Skills That Makes Claude/Cursor Act Like Senior Engineers

Addy Osmani (you know, the Google Chrome engineering leader) dropped something super useful for anyone using AI coding tools like Claude, Cursor, Gemini, etc. It's called Agent Skills – a free open-source repo with structured "skills" that force AI agents to follow real production-grade engineering workflows instead of just hacking together the quickest possible code.

The problem it solves:

AI agents are amazing at spitting out code fast. But they act like eager juniors: you ask for a feature, they write it, say "done," and move on. No spec, no proper tests, no review thinking, no checking edge cases, no keeping changes small and safe. That leads to messy, breakable code, exactly what senior engineers spend their careers avoiding.

Agent Skills bolts on the invisible senior work – the specs, plans, tests, reviews, and discipline that make software reliable at scale. It's inspired heavily by practices from Software Engineering at Google.

What exactly is a "skill"?

Each skill is a focused Markdown workflow (not just a long essay of best practices). It includes:

  • Step-by-step instructions the agent actually follows
  • Checkpoints that produce real evidence (like passing tests or logs)
  • Anti-rationalization tables – pre-written pushback against common excuses like "This is too simple for a spec" or "Tests later".
  • Clear exit criteria so you know when it's truly done

The repo has 22 skills total, including a meta one that routes everything, organized around the full software lifecycle.

The 7 slash commands

These are your main entry points:

  • /spec – Turn a vague idea into a clear spec/PRD
  • /plan – Break it into small, verifiable tasks
  • /build – Implement in safe, incremental slices
  • /test – Proper TDD and verification
  • /review – Code review with quality gates
  • /code-simplify – Keep things clear and boring (in a good way)
  • /ship – Safe deployment practices

Skills also auto-activate based on context (e.g., building UI triggers frontend rules).

How can you use this in different workflows?

1. Solo indie hacker / side project

You're building a new web app feature. Instead of prompting 'add user login' you do /spec first → get a clear spec. Then /plan → small tasks. /build + /test → incremental code with tests. Finally /review and /ship. Result: Cleaner code, fewer bugs, and you can actually maintain it later. Great for Claude Code or Cursor users.

2. Team environment with multiple devs + agents

Your team uses AI for PRs. Drop the skills into shared rules. Everyone gets consistent behavior: small PRs (~100 lines), proper tests, scope discipline (don't touch unrelated files), and review checklists. Anti-rationalization tables help stop 'it's fine, ship it' shortcuts. Reduces review fights and production incidents.

3. Learning / teaching or auditing your own process:

Even if you don't install it, just read the skills! They're like a documented senior-engineer playbook. Use test-driven-development.md to settle debates with juniors, or steal the five non-negotiables for your own AGENTS.md file:

  1. Surface assumptions early
  2. Ask when requirements conflict
  3. Push back when needed
  4. Prefer boring/obvious solutions
  5. Touch only what you're asked to touch

This third mode is gold even without AI, it improves human workflows too.

Quick start:

- Claude Code (recommended): Install via marketplace with a couple slash commands.
- Cursor / others: Copy Markdown files into your rules folder.
- Full setup docs in the repo for Gemini, Windsurf, Copilot, etc.

Repo: https://github.com/addyosmani/agent-skills (MIT license, already at 40k+ stars)

If you're using any AI coding agent, this feels like leveling up from 'fast code' to 'reliable software'. Have you tried similar prompt frameworks or rules? What's your biggest pain with agents skipping the important stuff? Would love to hear experiences in the comments!

u/ShilpaMitra — 1 day ago
▲ 40 r/WebAfterAI+4 crossposts

thought id be smart and track every dollar. budgeted $10/month for my agent. actual bill: $35

where it went

system prompt overhead: my SOUL.md + AGENTS.md + TOOLS.md + skill descriptions = 14,000 tokens. resent on EVERY message. at 50 messages/day thats 700K tokens just on system prompt. about $10/month on deepseek

conversation history compounding: by message 20 the agent resends all 19 previous messages. the later messages cost way more than the early ones. about $8/month

heartbeat: was running every hour. 24 full api calls per day. "nothing new" costs the same as an actual response. about $7/month

tool outputs baked into history: gmail returned a full email thread once (huge blob of text). that blob lived in session history forever and got resent with every subsequent message. about $10/month before i caught it

what fixed it: trimmed SOUL.md to 1500 tokens ($10 > $4). set maxHistoryMessages to 15 ($8 > $3). changed heartbeat to every 4 hours on deepseek v4 flash ($7 > $0.50). started using /new between unrelated tasks (killed the gmail blob problem)

went from $35 to about $8/month. same agent same tasks. just less waste

run /context list and /usage full for a day. youll be surprised where the money goes

u/Temporary-Leek6861 — 10 days ago

Tired of paying for API calls, hitting rate limits, or worrying about your data leaving your machine? Local LLM inference is the way to go. You can run powerful models like Llama, Mistral, Qwen, Gemma, and more right on your hardware – CPU, GPU, Apple Silicon, whatever you've got.

Here's a curated list of the best tools out there right now. I went through the top repos to highlight what makes each one special. Whether you're a beginner, dev, or running production workloads, there's something here for you.
Numbers are approximate as of now.

1. Ollama ★ ~170K+ github.com/ollama/ollama
The fastest and easiest way to get started. One command and you're chatting with a model.
ollama run llama3 – boom, done. It supports GPU acceleration, has a built-in REST API, and OpenAI-compatible endpoints so you can swap it into existing apps seamlessly. Perfect for developers and quick experimentation. It handles model discovery, running, and even integrates with agents/tools. If you want zero-friction local AI, start here.

Typical hardware: 8–16GB RAM for small models. 6–12GB VRAM recommended for good speed on 7–8B models. Scales up with larger ones via layers offloading. Perfect for beginners.

2. llama.cpp ★ ~108K+ github.com/ggml-org/llama.cpp

The absolute engine behind most local AI tools. Pure C/C++ implementation for maximum speed and efficiency. Runs on CPU, GPU, Apple Silicon – you name it. Extremely low memory usage and state-of-the-art performance. If Ollama is the sleek car, llama.cpp is the high-performance engine under the hood. It's the foundation for quantization, optimizations, and running on everything from a Raspberry Pi to beefy servers. Essential for anyone serious about local inference.

Typical hardware: Extremely lightweight, 4–8GB RAM/VRAM for 7B Q4 models. Can run 30–70B models on modest hardware thanks to aggressive quantization. Best for performance enthusiasts.

3. vLLM ★ ~78.7K github.com/vllm-project/vllm
For when you need serious throughput. This is the high-performance serving engine used by many AI companies in production. Features like continuous batching, paged attention, and OpenAI-compatible API make it ideal for deploying models at scale. Great for self-hosting multiple users or high-volume inference. If you're moving beyond single-user chats into something more robust, vLLM is the standard.

Typical hardware: Designed for servers. 16–24GB+ VRAM for efficient 7–13B serving. Much higher for large-scale multi-user deployments. Not ideal for basic laptops.

4. LM Studio ★ ~28K github.com/lmstudio-ai/lmstudio.js
The best desktop app for non-developers (and devs who want a clean UI). Beautiful interface to discover/download models from Hugging Face, run them locally, chat, and spin up an OpenAI-compatible local server. Excellent onboarding experience. Supports Mac, Windows, Linux. If you just want to point-and-click your way into local AI without wrestling with terminals, this is it.

Typical hardware: 16GB+ system RAM recommended. Starts working on 4–6GB VRAM GPUs. Great auto offloading to RAM/CPU when needed. Best for non-devs.

5. Jan ★ ~42.3K github.com/janhq/jan
A full open-source ChatGPT alternative that runs 100% offline. Clean, modern UI with model management, local API server, and everything you need. Works great on Mac, Windows, and Linux. No data ever leaves your machine. Perfect privacy-focused users who want a polished daily driver chat experience. Actively developed with a strong focus on being a complete local AI workstation.

Typical hardware: 8–16GB RAM minimum. Runs well on CPU-only or modest GPUs. Shows memory usage clearly before downloading models.

6. text-generation-webui (oobabooga) ★ ~46.9K github.com/oobabooga/textgen
The Swiss Army knife of local LLMs. Supports every model format, every backend, tons of samplers, character/roleplay mode, notebook mode, API mode, extensions – you name it. Insanely feature-complete. If you want maximum customization and power-user tools (including multimodal now), this is the one. The community around it is massive.

Typical hardware: Flexible but can be memory-hungry with all features. 8–12GB VRAM for smooth 7–13B use. Excellent low-VRAM modes and CPU offloading.

7. LocalAI ★ ~45.9K github.com/mudler/LocalAI
OpenAI drop-in replacement. Same API as OpenAI, but powered by local models. Swap out GPT/Claude in any app without changing code. Supports LLMs, vision, voice, image, video – runs on any hardware (no GPU required). Fantastic for integrating local AI into existing workflows or building your own stack. Privacy-first and very flexible.

Typical hardware: Lightweight backend. Similar to Ollama/llama.cpp - works on 8–16GB systems. Excellent for integration without heavy UI overhead.

Which one should you pick?

  • Newbie / just chatting → Ollama or LM Studio
  • Power user/max features → text-generation-webui or llama.cpp
  • Production / high throughput → vLLM
  • Full offline ChatGPT clone → Jan
  • Drop-in API replacement → LocalAI

The local AI ecosystem is exploding – models are getting better, tools are maturing, and hardware efficiency is insane. What are you running locally right now? Favorite model or tool? Drop your setups below!

reddit.com
u/ShilpaMitra — 13 days ago

Been thinking a lot about Andrej Karpathy’s April Sequoia talk, and it feels like the clearest map yet of where software engineering is actually going. Here’s the distilled version in plain English:

The New Software Stack (Software 3.0):

  • We’ve gone from writing every line by hand (Software 1.0) to training giant models (Software 2.0). Now we’re in Software 3.0, where the entire game is about giving LLMs the right context and letting prompting become the main way you steer the “interpreter.”
  • This isn’t just about going faster on the same old tasks - it opens the door to building stuff that used to be impossible or too slow, like turning a pile of raw documents into a living personal wiki in minutes.
  • Looking ahead, neural networks will be the main runtime, CPUs will just be helpful sidekicks, and UIs will be generated on the fly with diffusion models instead of static code.

Verifiability Is the Hidden Superpower:

  • Classic computers could only automate things you could spell out perfectly. LLMs flip that: they can automate anything you can check reliably afterward.
  • That’s why the top labs are pouring resources into reinforcement-learning setups - it creates those weird “jagged” capabilities where models crush verifiable stuff like math and code but still stumble on fuzzier areas.
  • For any team or founder: if you can turn your domain into something verifiable (tests, checks, feedback loops), you can build your own custom RL training runs and tune models specifically for your world. You don’t need the big labs to care about your niche.
  • Bottom line: almost any real-world process can eventually become verifiable - it’s just a matter of engineering the right guardrails and evaluation loops.

Vibe Coding vs. Real Agentic Engineering

  • Vibe coding lowered the bar to almost zero: anyone can now slap together functional software just by prompting until it “feels right.”
  • Agentic engineering is the pro upgrade - you keep (or even raise) the same high standards for security, correctness, and reliability, but now you get massive speed through AI agents running in tight, checkable loops.
  • The upside for experienced builders is insane: what used to feel like a 10x engineer is starting to look like 100x leverage once you master supervising agents instead of writing everything yourself.
  • Hiring is going to look completely different. Forget LeetCode puzzles. Hand candidates a real project like “ship a secure Twitter clone” and see how they break it down, direct agents, and verify the final output.

How Agents Actually Feel to Work With Today

  • Picture the perfect intern: photographic memory, never gets tired, executes at lightning speed - but their decision-making is still patchy and needs adult supervision.
  • That’s exactly where agents are right now. You stay in control of the big picture: taste, architecture, strategy, and final sign-off.
  • We’re not building sentient colleagues; we’re more like summoning helpful spirits. The right attitude is calm direction mixed with healthy doubt - no yelling, just clear specs and double-checks.
  • This mindset keeps you from over-trusting and helps you stay effective even when the agent output looks polished on the surface.

The Coming Wave of Agent-Native Tools and Systems

  • Right now most docs, READMEs, and infrastructure are still written like they’re only for human eyes - that’s leaving huge performance on the table.
  • The biggest friction today is everything around deployment, DNS, configs, and ops — those need to be redesigned from the ground up so agents can handle them smoothly.
  • Soon “my agent will ping your agent” won’t sound futuristic; it’ll be everyday language because we’ll have proper digital representations for people, teams, and organizations that agents can actually interact with.

The One Thing You Can’t Delegate

  • You can hand off the grinding, the boilerplate, and the execution but genuine understanding has to stay with you.
  • Humans are still the permanent bottleneck. If you don’t deeply get what’s being built and why, you can’t spec it well or verify it properly.
  • LLMs are amazing at pattern-matching and recall, but true comprehension is still our domain for now.

This whole shift feels like the moment when AI stops being a novelty toy and starts becoming the actual foundation of how serious software gets made. Vibe coding got the party started and let everyone play. Agentic engineering is what turns the party into a high-output, professional machine.

reddit.com
u/ShilpaMitra — 8 days ago

Major Supply Chain Attack: 575+ Malicious AI "Skills" Uploaded to Hugging Face & ClawHub (OpenClaw) by Just 13 Accounts

According to Acronis Threat Research Unit (report from ~April 30, 2026), attackers abused two popular AI platforms:

  • ClawHub (the official skill marketplace for the OpenClaw AI agent/personal assistant)
  • Hugging Face

They uploaded over 575 malicious skills using only 13 developer accounts. These were disguised as helpful AI tools, productivity assistants, YouTube transcript summarizers, etc.

Key Details:

  • Targets: Windows + macOS (cross-platform campaign)
  • Payloads: Trojans, cryptocurrency miners, and the AMOS (Atomic macOS Stealer) infostealer (MaaS commodity stealer targeting browser data, keychains, crypto wallets, etc.)
  • Techniques:
    • Hidden/obfuscated commands in READMEs or SKILL.md files
    • Indirect prompt injection – malicious instructions embedded so AI agents execute them automatically without user awareness
    • Social engineering: Fake "install OpenClawDriver" steps, password-protected archives from GitHub, base64-encoded shell commands, external downloads, etc.
    • Multi-stage chains leading to malware loaders, infostealers, etc.

Two accounts dominated:

  • hightower6eu: 334 malicious skills (~58%)
  • sakaen736jih: 199 malicious skills (~35%)

The rest were spread across minor accounts.

On Hugging Face, repos were used as staging infrastructure for multi-step infections targeting Windows, Linux, and Android too.

This isn't a vuln in the platforms per se, it's abuse of trust. Users and AI agents assume shared models/skills are safe, especially from "popular" looking accounts. The modular "skills" design in OpenClaw gives agents high privileges to run code, which attackers exploited.

Why This Matters:

AI agent ecosystems are exploding, and threat actors are shifting from traditional vectors (malvertising, fake GitHub repos) to poisoning these trusted hubs. The scale and speed are concerning; one earlier related campaign reportedly hit hundreds of malicious skills.

Immediate Advice:

  • Never install random AI models, datasets, or skills without verifying the source.
  • Check account age, followers, reviews, and publication history.
  • Manually inspect files (look for suspicious pip install, shell commands, external URLs, base64 blobs).
  • Prefer verified/official sources. Sandbox or review code if possible.
  • For agents: Pin versions/hashes, audit manifests, limit execution privileges.

Full Acronis report: https://www.acronis.com/en/tru/posts/poisoning-the-well-ai-supply-chain-attacks-on-hugging-face-and-openclaw/

SecurityWeek coverage: https://www.securityweek.com/hugging-face-clawhub-abused-for-malware-distribution/

This is a wake-up call for the AI community. Trust is the new attack surface. Stay safe out there - what are your thoughts on securing agentic AI workflows going forward?

reddit.com
u/ShilpaMitra — 6 days ago

Most AI agents are just fancy chatbots that reset every time and waste tokens. This one actually runs on your own machine or cheap VPS, remembers everything, and keeps getting smarter. It builds its own reusable skills from experience and even learns how you work. It has long-term memory, 40+ tools (web search, browser control, code execution, etc.), works on Telegram/Discord/CLI, and you can use pretty much any model. Fully open source, MIT license, no tracking.The self-improving part is legit — once it figures something out, it saves it for good.Also saw it has almost no fake GitHub stars compared to a lot of other projects. Worth a look if you're into self-hosted stuff.

PS: I used hetzner but then switched to managed hosts for setup(tried a couple), right now I stay on r/primeclaws

reddit.com
u/FunThen4634 — 12 days ago

I've been deep in traditional RAG setups for a while – chunking docs, embedding everything, shoving it into Pinecone/Chroma/whatever, then hoping similarity search pulls the right context. It works okay for simple stuff, but it falls apart on long, structured documents like financial reports, SEC filings, research papers, or PDFs with tables, cross-references, and hierarchy. You lose context, get hallucinated answers, or irrelevant chunks.

Enter PageIndex – an open-source vectorless, reasoning-based RAG framework from VectifyAI. Instead of vectors and similarity, it builds a hierarchical tree index (basically a smart, LLM-generated table of contents) from your documents. Each node has titles, summaries, page ranges, and metadata. Then an LLM reasons over this tree like a human analyst would: navigating sections, drilling down, following logical paths, and extracting precise info.

How it works:

  1. Index Generation: Feed in a PDF/Markdown/etc. → LLM creates a JSON tree structure (hierarchical TOC with summaries). No arbitrary chunking that breaks meaning.
  2. Reasoning Retrieval: For a query, the LLM explores the tree agentically – deciding which branches to follow, why, and pulling exact relevant sections. Fully explainable (you can see the path it took).

They built Mafin 2.5 on top of it and scored 98.7% accuracy on FinanceBench – crushing traditional vector RAG baselines (often 30-60% on the same complex financial QA tasks). It's especially strong on structured docs with internal references and hierarchy.

Pros:

  • Preserves full document structure and context.
  • Human-like reasoning → better for complex, professional docs (finance, legal, pharma, etc.).
  • No vector DB dependency → simpler stack, potentially more reliable retrieval.
  • Open source (MIT license) with GitHub repo, cookbooks, and notebooks for quick starts. Works with local LLMs too.
  • Great explainability – trace exactly which sections were used.

Tradeoffs:

  • Higher token usage and more LLM calls during tree traversal → can be slower/more expensive for massive docs or high volume.
  • Best for well-structured content; messier or very unstructured data might need tweaks.
  • Indexing step adds upfront compute (but you do it once).

If you're building anything with long-form docs or need high accuracy on domain-specific QA, this feels like a game-changer paradigm. "Similarity ≠ Relevance" is the key insight here.

Links to check out:

Has anyone else played with it? How does it compare in your real-world use cases vs. LlamaIndex, LangChain vector setups, or graph RAG? Especially curious about latency/cost on production loads or non-finance domains.
Would love to hear experiences or tips!

u/ShilpaMitra — 9 days ago

If you haven't seen it yet, the Warp team dropped the full client codebase for their agentic development environment this week (initial public release was literally 5 days ago). The repo is already sitting at 52.9k stars and 3.7k forks. It's not just another terminal emulator - Warp is a full Rust-built terminal + cloud agent orchestration platform that lets you run parallel, programmable, auditable coding agents (their built-in "Oz" or bring-your-own like Claude Code, Codex, Gemini CLI, etc.).

Repo: https://github.com/warpdotdev/warp

What is Warp exactly?

From their READ.ME: "Warp is an agentic development environment, born out of the terminal. Use Warp's built-in coding agent, or bring your own CLI agent."

It modernizes the terminal with:

  • Modern UI/UX (blocks, inline editing, etc.)
  • Built-in AI agent ("Oz") that can orchestrate cloud agents for parallel task automation
  • Full terminal + shell integration (they pulled in NuShell influences)
  • Drive sync, workspaces, notebooks, AI context awareness, codebase indexing
  • Cross-platform (macOS, Linux, Windows - even WASM support mentioned in topics)

The repo itself now contains the entire client (app + 60+ Rust crates). Server-side Oz orchestration, Warp Drive backend, and hosted auth remain closed-source for now.Tech stack & architecture highlights (from WARP.md + Cargo workspace)

  • 98.2% Rust monorepo with a Cargo workspace
  • Custom WarpUI framework (crates/warpui_core and crates/warpui - these two are MIT licensed)
  • Everything else: AGPL v3 (deliberate choice - they explain it in FAQ: they want forks/modifications to stay open and avoid closed-source derivatives)
  • Key crates include: warp_core, editor, ipc, graphql, persistence (Diesel + SQLite), terminal, ai, drive, auth, etc.
  • Inspired by / borrows from: Alacritty (terminal), Tokio, Hyper, FontKit, NuShell, Fig autocomplete specs, etc.
  • Architecture notes:
    • Entity-Component-Handle pattern in the UI layer (Flutter-inspired elements + actions system)
    • Careful terminal model locking (they warn about deadlocks causing beachballs)
    • Feature flags for progressive rollouts
    • GraphQL client, Diesel ORM, platform-specific code with cfg guards

Build is dead simple:

bash

./script/bootstrap   # platform setup
cargo run            # or ./script/run
./script/presubmit   # fmt + clippy + tests

Full engineering guide in WARP.md, very detailed on style (no unnecessary type annotations, specific import rules, inline format args, etc.), testing (nextest + integration framework), and gotchas.

The contribution model is wild (and meta):

They didn't just dump code - they built an entire agent-powered OSS workflow around Oz (their own agent orchestration platform):

  • Issues get auto-triaged by Oz agents
  • Features require a spec PR first (specs/GH#issue/product.md + tech.md) — product spec (user behavior invariants) + tech spec (impl plan with file references)
  • Bug fixes are implicitly ready-to-implement
  • When you open a PR: Oz auto-reviews it first, then escalates to a human SME
  • You can literally ask Oz to implement issues for you (free credits for contributors)
  • There's a public dashboard at https://build.warp.dev showing thousands of Oz agents actively triaging issues, writing specs, implementing changes, and reviewing PRs on this very repo

See CONTRIBUTING.md and FAQ.md : it's one of the most thoughtful agent-native OSS processes I've seen. They even have agent skills in .agents/skills/ and example specs.

Slack community (#oss-contributors channel) is actively encouraged for questions/pairing.

Licensing & Open Source philosophy (FAQ):

  • UI framework crates: MIT (intentionally permissive so others can use the general-purpose UI lib)
  • Rest of client: AGPL v3 (network-use clause included "we don't want someone forking and shipping closed-source")
  • Server/Oz/Drive: still proprietary (no promises on open-sourcing yet)
  • OpenAI is the founding sponsor of the new open-source repo; some new agent workflows powered by GPT models

They also call out a bunch of foundational OSS deps they relied on (Tokio, Alacritty, etc.).

How Warp compares to other similar modern terminals:

Repo / Project Stars Primary Language License AI / Agentic Features Platforms Supported GPU Accelerated Built-in Multiplexing / Tabs / Splits Key Differentiator / Strength Last Major Activity
Warp (warpdotdev/warp) 52.9k Rust (98.2%) AGPL v3 (UI crates: MIT) Yes – Full agentic dev env (built-in Oz coding agent + external CLI agents like Claude Code, Codex, Gemini). Oz agents auto-triage issues, write specs, implement, review PRs in the repo itself. macOS, Linux, WASM Yes Yes (blocks, command history, notebooks) Agent-native OSS workflow + cloud agent orchestration. Modern app-like UI from scratch. May 2, 2026 (very active)
Wave Terminal (wavetermdev/waveterm) 20.1k Go + TypeScript Apache-2.0 Yes – Wave AI (context-aware, multi-model: OpenAI, Claude, local via Ollama). Inline AI chat, file ops, terminal-aware assistant. macOS, Linux, Windows Yes Yes (draggable blocks, panels, editors, browser) Closest open-source AI-native alternative. Built-in file previews, graphical editor, durable SSH. May 1, 2026
Ghostty (ghostty-org/ghostty) 53.3k Zig (78.6%) MIT None macOS, Linux, Windows, WASM Yes (Metal/OpenGL) Yes (native tabs, splits, multi-window) Blazing speed + native platform UI/feel. Lightweight embeddable libghostty. May 2, 2026 (very active)
Alacritty (alacritty/alacritty) 63.8k Rust (96%) Apache-2.0 None macOS, Linux, Windows, BSD Yes (OpenGL) No (pair with tmux/zellij) Minimalist “fastest terminal” philosophy. Sensible defaults, no bloat. May 1, 2026
WezTerm (wez/wezterm) 25.9k Rust (98.9%) MIT None macOS, Linux, Windows + more Yes Yes (full multiplexer built-in) Extremely configurable (Lua scripting). Great for power users who want everything in one tool. Mar 31, 2026 (solid but slower pace)

Quick Takeaways

  • Warp stands out as the only one with deep agentic/orchestration capabilities (Oz + cloud agents) and a self-dogfooding agent-powered contribution process.
  • Wave is the strongest direct open-source competitor if you want AI + modern IDE-like features without Warp’s closed server components.
  • Ghostty and Alacritty win on raw speed and minimalism (perfect if you just want a blazing-fast drop-in replacement).
  • WezTerm is the configurable Swiss-army knife (built-in multiplexer + Lua).
  • All are actively maintained except WezTerm has a slightly slower recent commit cadence.

Why this matters:

Terminals have been stagnant for decades. Warp is trying to drag them into the AI/agent era. Full client in Rust with a custom UI framework? That’s a massive code drop. The self-hosting/agent-driven contribution loop is next-level. Watching agents work on the repo that powers agents is peak 2026.
If you’re into Rust, terminals, AI agents, or just curious about a 60+ crate monorepo with production-grade terminal emulation + cloud sync, go poke around:

app/ → main app
crates/ → the meat
specs/ → real product/tech specs
WARP.md → bible for contributors
.github/ + Oz integration → future of OSS?

Would love to hear from anyone who’s already built it locally or started contributing. Has anyone tried pointing their own Claude Code / Cursor at it yet? Or how does it stack up for you against Wave/Ghostty?

u/ShilpaMitra — 11 days ago

Nous Research just shipped Hermes Agent v0.12.0 ("The Curator Release"), and the standout feature is Multi-Agent Kanban – a durable, shared task board that lets multiple named agent profiles collaborate like a real team, without the usual fragile sub-agent swarms or terminal juggling.

What is Hermes Kanban?

It's a SQLite-backed work queue (at ~/.hermes/kanban.db) shared across all your Hermes profiles on the same machine. Tasks have assignees (profile names like "researcher", "backend-dev", "writer"), statuses (Triage → Todo → Ready → In Progress → Blocked → Done), dependencies, workspaces (scratch dirs, shared folders, or git worktrees), and full audit trails.

Key innovations:

  • Agents claim tasks atomically as independent OS processes. No more in-process subagent hell.
  • Dispatcher (embedded in the gateway by default) polls every ~60s, reclaims crashed/stale tasks, promotes dependencies, and spawns workers.
  • Crash recovery + circuit breaker: Failed tasks get retried; after ~3 failures it auto-blocks and waits for human input. No more infinite thrashing.
  • Structured handoffs: Workers use dedicated kanban_* tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, etc.) to read context, post summaries/metadata, block for input, or fan out child tasks. Parent summaries/metadata flow automatically to children.
  • Web Dashboard at http://localhost:9119 – real-time WebSocket updates, filters, profile lanes, "Nudge Dispatcher" button. Perfect single pane of glass.
  • CLI + slash commands everywhere (/kanban ... in chats/gateways).

Comparison to delegate_task (from the docs):
delegate_task = short RPC-style fork/join (blocks parent).
Kanban = durable queue with named persistent agents, human-in-loop, retries, audit trail, peer coordination. Use Kanban when work spans sessions, needs humans, or survives restarts.

Real Use Cases

  1. Solo dev pipelines: Design schema → Implement API → Write tests with automatic dependency promotion and handoff summaries.
  2. Fleet operations: Multiple specialist profiles (translator, transcriber, copywriter) pulling independent tasks in parallel.
  3. Role pipelines with review/retry: PM → Engineer (blocks on feedback) → Engineer retry → Reviewer. Full run history visible.
  4. Robustness: Circuit breaker on permanent failures, auto-reclaim on crashes.

Other v0.12.0 Highlights:

  • Autonomous Curator: Background agent that grades/prunes/consolidates your skill library on a schedule.
  • Big self-improvement loop upgrades.
  • Native Spotify + Google Meet integrations.
  • More providers, platforms (Teams plugin, etc.), ComfyUI/TouchDesigner bundled by default.
  • ~57% faster TUI cold start, tons of quality-of-life wins.

Why This Matters

Most multi-agent setups die on orchestration state and reliability. Hermes treats agents as durable workers with shared memory/state via the board. It's built by model trainers (the Hermes/Nomos/Psyche folks) who clearly understand what actually breaks in production agent fleets.

Quick Start (from docs):

hermes kanban init
hermes gateway start
hermes dashboard  # opens browser
hermes kanban create "Your task here" --assignee researcher

Has anyone tried the new Kanban yet? How's it compare to OpenClaw/Cline/etc. for your workflows? Especially curious about fleet-scale or research triage use cases.

reddit.com
u/ShilpaMitra — 10 days ago

Over the past few weeks, I’ve been putting several local coding agents through real-world workflows: Claude Code, Cursor CLI, Gemini CLI, and a couple of others. I’ve used them for debugging complex flows, running tests, inspecting logs, and shipping small features. One thing became crystal clear very quickly: most setups are quietly leaking sensitive data. This isn’t because of some obscure bug. It happens because of how these agents are fundamentally designed to operate.

These tools are built to explore your codebase aggressively, gather as much context as possible, execute commands on your behalf, and surface anything that might help them complete the task. If secrets are anywhere in reach, they will eventually end up in the model’s context and get sent to the provider’s servers.

Most developers focus only on .env files, and that’s understandable. But that’s only one piece of a much larger exposure surface. The real leaks happen in three main places that catch people off guard.

First, there’s direct file access. The agent indexes the repo, runs commands like cat on config files, or auto-discovers sensitive files during its initial scan.

Second, and this is the one most people completely overlook, is runtime output. When the agent runs tests, starts your dev server, or executes any command that hits a failing API call, you can end up with stack traces, error headers, or log lines that contain real tokens and credentials. A single curl command with an Authorization header, for example, can dump the secret straight into the conversation history.

curl https://api... -H "Authorization: Bearer $SECRET"

Third, there’s search-based exposure. The agent runs grep, find, or pattern scans looking for “config,” “auth,” or similar terms, and secrets surface unintentionally in the results.

The common protections most of us start with simply don’t hold up under pressure. Things like instructions in CLAUDE.md, a .claudeignore file, or even relying on .gitignore feel like they should work. In reality, they are only advisory layers. When the agent is deep in a complex task and trying to be maximally helpful, it prioritizes solving the problem over following soft rules.

The only approach that has actually worked for me is blocking access at the system level before the agent ever gets a chance to see the files. Here’s the setup I now run on every machine.

1. Hard deny rules (the real baseline):

I put this in ~/.claude/settings.json for machine-wide protection. It uses enforced permissions that the agent physically cannot bypass:

{
  "permissions": {
    "deny": [
      "Read(./.env*)",
      "Read(**/.env*)",
      "Read(./*.pem)",
      "Read(./*.key)",
      "Read(**/.ssh/**)",
      "Read(**/.aws/**)",
      "Read(./secrets/**)",
      "Read(./credentials/**)"
    ],
    "allow": [
      "Read(./src/**)",
      "Bash(npm run *)",
      "Bash(pnpm *)",
      "Bash(docker *)"
    ]
  }
}

Note: Claude is one of the few tools today that ships with enforceable permission controls. In most other setups (Codex, Cursor, Aider), you have to implement that boundary yourself at the OS or container level. More on that below.

2. Dummy runtime environment:

I never let the agent touch real secrets during execution. I create a .env.test file with fake values and point my dev server, tests, and CLI commands at it. Real keys stay completely outside the execution path.

cp .env.example .env.test

3. Move secrets out of plaintext files entirely:

Plain .env files are the weakest link. I now use a proper secrets manager: 1Password CLI, Infisical, Doppler, or even the OS keychain for anything sensitive. A simple wrapper like export STRIPE_KEY=$(op read "op://project/stripe/key") means the agent never sees the actual value.

4. Pre-commit scanning:

As a final guardrail before anything hits the repo, I run a quick scan on staged files. A simple git hook or tools like git-secrets and trufflehog catch patterns like API keys or secret tokens before they can ever be committed.

git diff --cached | grep -E "sk_live|api_key|SECRET"

Or use:

  • git-secrets
  • trufflehog

5. Container isolation (the strongest layer):

For the most sensitive projects, I run the entire agent inside Docker with the real .env mounted as /dev/null or kept entirely outside the container. The agent works normally, but the secrets never enter its environment.

docker run -v $(pwd):/app \
  -v /dev/null:/app/.env \
  agent-runtime

This whole process forced a mental model shift for me. AI coding agents aren’t just fancy IDE features. They are autonomous systems with real file access, the ability to execute commands, and a single goal: complete the task as efficiently as possible. I now treat them the same way I would treat any untrusted code running on my machine.

Before I start any new project, I run through this quick checklist:

  • Deny rules active in the global config?
  • No real secrets sitting in the project root?
  • Dummy environment file configured for all runtime tasks?
  • Pre-commit scanning enabled?
  • Secrets stored in a proper manager or vault?
  • Container isolation set up if the project is particularly sensitive?

At the end of the day, if a secret is accessible, it will eventually be surfaced, not because the model is malicious, but because the agent is simply doing exactly what it’s optimized to do.

I’m curious how the rest of you are handling this. Are you relying primarily on deny rules, full vaults, container workflows, or something else entirely? I’d love to hear what’s working (or what you’ve had to tweak) in your own setups.

reddit.com
u/ShilpaMitra — 12 days ago

We’ve officially entered the agent era. No more - here’s a helpful answer and goodbye. Now the model plans, uses tools, writes code, delegates tasks, loops until it succeeds, and actually gets shit done.
I went through the current top open-source agent projects line by line and put together the ultimate quick-start guide. If you’re building agents (or just want to play with the coolest stuff), this list will save you weeks of research.

1. OpenHands ★ 72.7K github.com/All-Hands-AI/OpenHands

The open-source Devin killer. This is a full AI software engineer that can:

  • write code
  • run tests
  • debug
  • fix bugs
  • even deploy

Works with Claude, GPT-5, local models - whatever you throw at it. If you want the single most capable autonomous coding agent right now, OpenHands is winning.

2. AutoGen ★ 57.8K github.com/microsoft/autogen

Microsoft’s multi-agent conversation framework. This is the heavyweight champion for complex agentic workflows. You spin up multiple agents that literally talk to each other, delegate subtasks, write and execute code in real time, and keep going until the goal is solved. If you need a full autonomous team that can handle messy, multi-step problems, AutoGen is still one of the most powerful options out there.

3. CrewAI ★ 50.7K github.com/crewAIInc/crewAI

The easiest way to build multi-agent systems that actually work in production. You literally define a “Crew,” assign roles (researcher, writer, critic, etc.), give them a shared goal, and they collaborate like a real team. Role-playing agents + simple orchestration = insane productivity. If you want something that feels magical but is dead simple to set up, start here.

4. Agno ★ 39.9K github.com/agno-agi/agno

Fast, clean, multi-modal agent framework that’s gaining massive traction. Supports any LLM, any tool, long-term memory, knowledge bases, and storage out of the box. It’s advertised as 10× faster than LangChain for simple agents, with a beautiful API and some of the best documentation I’ve seen. Perfect middle-ground between minimalism and full power.

5. LangGraph ★ 31.3K github.com/langchain-ai/langgraph

The production-grade agent framework from the LangChain team. Instead of linear chains, you build stateful multi-agent workflows as graphs. Nodes = agents or tools, edges = transitions, and it natively supports cycles, branching, human-in-the-loop, memory, and complex logic. If you’re past the prototype stage and need something reliable at scale, this is the one.

6. Smolagents ★ 27.1K github.com/huggingface/smolagents

The anti-LangChain. Hugging Face’s ultra-minimal agent framework - the entire codebase is ~1000 lines of clean code. These are pure code agents: they write and execute Python to solve tasks. No bloat, no magic, just simple, fast, hackable agents. If you hate heavy frameworks and just want something that works in minutes, this is it.

7. SuperAGI ★ 17.5K github.com/TransformerOptimus/SuperAGI

Self-hosted autonomous agent infrastructure with a full GUI. Features include:

  • agent marketplace
  • performance telemetry
  • concurrent agents
  • graphical interface

You can literally run dozens of agents in parallel on your own server. If you want to go beyond single agents and build your own agent OS, SuperAGI is built for that.

So, which one are you using (or planning to try) first?

  • Building quick multi-agent teams? → CrewAI
  • Need maximum power and flexibility? → AutoGen
  • Going production with complex workflows? → LangGraph
  • Want speed + cleanliness? → Agno or Smolagents
  • Coding agent supremacy? → OpenHands
  • Self-hosted agent empire? → SuperAGI

Drop your current stack in the comments. I’m genuinely curious what the community is shipping with these days.

u/ShilpaMitra — 7 days ago

Hey r/WebAfterAI! Thanks again for the invite u/ShilpaMitra — really appreciate the work that went into this subreddit.

TL;DR: Ollama / LM Studio / LocalAI run the model. LDR is the agentic research layer that sits on top

Where it fits in the stack

Your hardware
  └── Ollama / LM Studio    ← runs the LLM
       └── LDR              ← plans, searches, synthesizes
            └── SearXNG · arXiv · PubMed · your docs  ← sources

LDR talks to any OpenAI-compatible endpoint — so LM Studio, LocalAI, vLLM, text-gen-webui (--api) all plug in alongside Ollama. This isn't "instead of" your favorite runner — it's "on top of."

https://preview.redd.it/y0ckgcht0myg1.png?width=1995&format=png&auto=webp&s=2b513873ebf0ab6b95582c13c26590cd2b471830

What LDR adds on top of a plain local LLM

  • Our most advanced strategy is the langraph-agent strategy. It has the ability to use search engines as tools and can go deep into the research and plans it next steps.
  • Searches arXiv, PubMed, Wikipedia, SearXNG, OpenAlex (great for research heavy tasks), plus your private documents and LDR downloads
  • Journal Quality System (recently shipped 🎉) — automatic journal reputation scoring with 212K+ indexed sources, predatory-journal detection, and a quality dashboard. Powered by OpenAlex (CC0), DOAJ (CC0), and Stop Predatory Journals (MIT). When LDR cites a paper, you can see at a glance whether the venue is reputable.
  • Library that grows with you — every research run can pull sources directly into your encrypted library: arXiv papers, PubMed articles, web pages. Upload your own files too (PDF, TXT, MD, DOCX, and more). LDR extracts and indexes everything, so your next session searches the live web and your accumulated library together. Your knowledge compounds over time.
  • Per-user SQLCipher (AES-256) databases — even server admins can't read your data at rest
  • LangChain retriever integration (FAISS / Chroma / Weaviate / Elasticsearch / etc.)
  • REST API, WebSocket progress, PDF/Markdown export, scheduled research digests

An Image of LDR:

https://preview.redd.it/xk7hzhgwylyg1.png?width=1139&format=png&auto=webp&s=b3d69e9c854c91f5a24526c261eb25f6e25b7084

🛡️ How LDR treats the web (and you)

Given this sub's whole vibe — what does the web look like after AI? — I want to be explicit about the stance we've taken. None of this is marketing; it's all in the repo and you can audit it.

  • No telemetry. At all. LDR contains no telemetry, no analytics, and no tracking. We do not collect, transmit, or store any data about you or your usage. No analytics SDKs, no phone-home calls, no crash reporting, no external scripts. The only network calls LDR makes are ones you initiate: search queries (to engines you configure), LLM API calls (to your chosen provider), and notifications (only if you set up Apprise).
  • Honest crawling. LDR respects robots.txt and identifies itself honestly when fetching web pages, i.e. no stealth or anti-detection techniques. In rare cases this means a page that blocks automated access won't be fetched. We consider that the right trade-off.
  • Zero-knowledge by design. Each user's database is encrypted with a key derived from their own password. No password recovery, because that would mean we held the keys. We don't.
  • Signed builds. Docker images are signed with Cosign, include SLSA provenance attestations, and ship with SBOMs. Verifiable supply chain — cosign verify localdeepresearch/local-deep-research:latest.
  • Open source, MIT. Audit anything.
  • We give back. LDR is built on Wikipedia, arXiv, PubMed, OpenLibrary, Project Gutenberg, Wayback Machine, The Guardian, OpenAlex, DOAJ — projects that run on donations, not paywalls. If LDR is useful to you, consider donating to one of them. They're the reason any of this works.

Recent benchmarks (fully local, Ollama, langgraph_agent strategy)

Model SimpleQA xbench-DeepSearch
Qwen 3.6 95.7% (287/300) 77.0% (77/100)
Qwen 3.5 9B 91.2% (182/200) 59.0% (59/100)
gpt-oss 20B 85.4% (295/346)

https://huggingface.co/datasets/local-deep-research/ldr-benchmarks

For context, our cloud reference run with GPT-4.1-mini lands ~95% on SimpleQA (with an older strategy) — so Qwen 3.6 locally is right there. One pattern we're seeing: results seem to track tool-calling quality more than raw model size. The langgraph_agent strategy leans heavily on the model issuing well-structured tool calls across iterations, and that's exactly the axis where the newer Qwen generations have improved most. Speculative, but it'd explain why Qwen 3.6 at modest size beats much larger older models on these tasks. If anyone wants to test that hypothesis with us, we'd love the data.

Caveats apply (sample sizes, LLM-grader noise, possible SimpleQA contamination); details on the dataset card. Full leaderboard + raw YAMLs: huggingface.co/datasets/local-deep-research/ldr-benchmarks

Quick start

# Docker
docker run -p 5000:5000 localdeepresearch/local-deep-research

# or pip
pip install local-deep-research

You'll also want Ollama (or any OpenAI-compatible endpoint) and SearXNG — full setup in the github.com/LearningCircuit/local-deep-research .

Then open http://localhost:5000.

Happy to answer questions, share strategy configs, or help anyone reproduce the Qwen runs.

reddit.com
u/ComplexIt — 12 days ago

The AI industry has officially entered its full "Services as a Service" era.
Bloomberg reports OpenAI is finalizing a $10B joint venture (The Deployment Company) with private equity giants like TPG, Brookfield, Advent, Bain Capital and others to deploy AI across enterprises. At the exact same time, Anthropic just announced its own $1.5B enterprise AI services venture backed by Blackstone, Hellman & Friedman, Goldman Sachs (each committing ~$300M), plus General Atlantic, Apollo, Sequoia, and more.

We've gone full circle:

  • First it was Models as a Model
  • Then Platforms as a Platform
  • Now it's straight-up Services as a Service

The frontier labs have realized the real money isn't just in the weights, it's in showing up at companies, embedding their models into legacy systems, providing hands-on consulting and "forward-deployed" engineers (very Palantir-style), running the change management, and billing big for managed outcomes and transformation journeys.

This is the SaaS gold rush 2.0, except the contracts are nine figures, the slide decks are AI-powered, and the targets are thousands of private equity portfolio companies ready to be force-fed Claude or GPT integrations.

Palantir has been living this dream for years. Now OpenAI and Anthropic are scaling the playbook with massive institutional capital.

Are we about to see an explosion of "AI integration" firms that are basically modern Accenture with better tech and deeper pockets? Or is this finally the mechanism that gets useful AI out of the demo phase and into the real world at scale?

reddit.com
u/ShilpaMitra — 9 days ago