Title suggestion

I built Aura Agent: a goal-driven supervisor for long-running coding tasks, with workers, watchdogs, and reflection loops

Hi everyone, I’m open-sourcing a project called Aura Agent:
https://github.com/erickong/aura-agent

Aura is a two-layer autonomous task orchestrator for long-running coding goals.

Instead of asking one chat session to do everything, Aura runs a persistent Layer 1 orchestrator that wakes up periodically, checks evidence, updates a task tree, and launches bounded Layer 2 worker processes through backends like claude_code or ds_code.

It also has a lightweight watchdog layer around the loop: it monitors worker processes, wakes the orchestrator early when something stops or when an external wake signal appears, and helps prevent long-running tasks from silently drifting forever.

我做了一个双层自主编程 Agent。不是单次聊天式 coding agent，而是一个会周期性醒来、检查文件证据、维护任务树、启动/杀掉 worker、记录决策历史的长期任务编排器。它还有 watchdog 机制，用来监督 worker、处理提前唤醒、发现进程停止等情况。

Why I built it

Claude Code and similar CLI agents are powerful, but for multi-hour / multi-day tasks I kept wanting a higher-level supervisor:

What has actually been completed?
Which worker is stuck?
Did a task produce real files, or just say “done”?
What decision changed the task state, and what evidence supported it?
Can the system keep iterating without losing context?
Is the short-term work still aligned with the long-term goal?

Aura is my answer to that.

Architecture

goal.md
  -&gt; Aura Orchestrator
       - persistent task tree
       - progress report
       - memory
       - evidence-based decisions
       - periodic review / reflection
  -&gt; Watchdog
       - monitors worker processes
       - wakes the orchestrator early
       - detects stopped or stuck workers
  -&gt; Layer 2 Workers
       - claude_code or ds_code
       - isolated workspace per task
       - result.md / logs / artifacts

Aura can run from any project directory, similar to Claude Code. The current directory becomes the project root, and runtime state goes into .aura/.

Global setup is separate:

aura setup

This writes global config to:

~/.aura/config.env
# or on Windows:
%USERPROFILE%\.aura\config.env

Then in any project:

aura start --task-file goal.md

Reflection loop

Aura also has a configurable reflection system. By default, it can run a deeper review about once per hour.

The reflection system asks questions like:

Is the current short-term goal still aligned with the long-term mission?
Are we optimizing the wrong thing?
Are workers producing real progress or just activity?
Should we continue, replan, decompose, or switch direction?
What lessons should be written into long-term memory?

This is important because long-running agents can easily become busy without being useful. I wanted Aura to not only “keep working”, but periodically step back and ask whether the work still makes sense.

中文补充：
这个反思系统大概每小时运行一次，可配置。它会检查当前短期执行方向是否还符合长期目标，是否需要重新规划，是否有任务只是看起来很忙但没有真实产出。

Why DeepSeek makes this more practical

One reason this kind of system is becoming realistic now is cost.

A persistent orchestrator wakes up many times, reads state, checks progress, starts workers, kills stuck tasks, and reviews direction. If every cycle is expensive, the architecture becomes hard to justify.

DeepSeek v4 being relatively cheap changes the equation. It makes it much more practical to run a long-horizon supervisor loop instead of treating every agent run as a precious one-shot interaction.

I don’t think cheap models automatically solve autonomy. But they do make it possible to build systems that can afford to inspect, retry, reflect, and iterate.

中文简单说：
幸亏 DeepSeek v4 这种模型价格相对便宜，这种“长期运行 + 周期性检查 + 反思 + 迭代”的系统才真正可行。不然每次 wake、检查、总结、重规划都太贵，最后很难长期跑。

Example experiment

One of my test missions was intentionally aggressive:

>Self-iterate and improve ds-code, compare it against Claude as a baseline, reduce tool failure rate below 5%, and keep iterating on around 10 representative complex coding tasks until the CLI is faster / more accurate / more reliable.

Important: API keys were redacted before publishing.

The original mission asked Aura to:

Optimize a local ds-code directory using DeepSeek v4-pro.
Improve CLI efficiency, accuracy, and speed compared with Claude.
Reduce tool failure rate below 5%.
Build a benchmark loop against Claude on representative complex tasks.
Keep iterating until the metrics are met.

After about 2.5 hours, Aura had produced this progress snapshot:

Wake cycles: 21
Completed tasks: 17
Active tasks: 3
Failed tasks: 0
Blocked tasks: 0
Replans: 0

It decomposed the mission into work like:

define 10 representative complex benchmark tasks
build a comparison runner for ds_code vs Claude
run baseline comparisons
analyze prompt bottlenecks
analyze tool failure modes
profile CLI speed bottlenecks
apply CLI speed quick wins
deploy an optimized DeepSeek system prompt
fix critical tool issues
build a tool reliability test suite
verify tool failure rate

The tool reliability check reported 0% failure in the tested suite, while the full Claude-vs-ds-code benchmark was still running.

So I’m not claiming “it beat Claude” yet. The point is that Aura kept the experiment structured, measurable, and auditable instead of becoming a giant messy chat log.

中文补充：
这个例子里，是 Aura 自动把一个非常大的目标拆成可验证任务，持续运行 worker，记录每次状态变化的原因和证据，并在 worker 卡住时杀掉、重启或换策略。它更像一个长期项目经理 + 自动化执行监督器。

Compared with Hermes / OpenClaw / 小龙虾 style systems

Aura is inspired by self-evolving agent loops and task-ledger systems, but it is more conservative:

Area	Aura approach
Concurrency	Max 2 workers by default, quality over swarm size
State	Persistent task tree + decision log
Completion	Requires evidence from files/logs/artifacts
Watchdog	Monitors workers and wakes the loop early
Reflection	Periodic review of short-term direction vs long-term mission
Cost	Small worker count, cached reads, compact context snapshots
Failure handling	Worker health checks, stuck detection, state backups
Goal	Long-running project completion, not just broad exploration

Glad to hear from you.

GitHub: https://github.com/erickong/aura-agent

u/Civil-Direction-6981