Title Idea: How I used Claude Code + Subagent-Driven Development to ship 2 ML research notebooks in 48 hours
The Project
I’m building the research arm of Parley—AR glasses for real-time two-way conversation between hearing and deaf users. The research question: How much does hand-shape alone carry the signal for isolated-sign recognition vs. temporal information?
The interesting part for this sub isn't the ASL research—it's the workflow. Claude Code did ~95% of the implementation with me acting as architect and reviewer.
The Workflow: Subagent-Driven Development
I used the pattern fromobra/superpowers:
- Detailed Implementation Plan: A ~2000-line markdown file with tasks broken into bite-sized steps including exact code snippets.
- Fresh Subagents: I dispatched one fresh subagent per task. No session inheritance—every task starts with a clean slate.
- Two-Stage Review: * Spec-compliance subagent verifies the diff against the plan.
- Code-quality subagent runs a second pass for best practices.
- Parallel Execution: I ran 30 tasks across ~22 dispatches, batching 3 at a time where safe.
Model Selection
- Haiku: Mechanical code (scaffolding, simple functions, test files).
- Sonnet: Implementations requiring judgment (architecture, bug fixes) and final-pass reviews.
3 Bugs the Review Loop Caught (That I Would've Missed)
- The MediaPipe Trap: My
hand_feature_vectorfunction was silently dropping the right hand. It assumed hand landmarks were contiguous, but MediaPipe places Pose (33 landmarks) between Left and Right hands. A subagent flagged that the slice was grabbing pose data instead of the right hand before I wasted hours on training. - The Early-Stop Crash:
aggregate_over_seeds()crashed on non-numeric keys ("early_stop") after 2 hours of training. A subagent wrote a standalone recovery script to re-aggregate from on-disk artifacts, saving a 3-hour retrain. - Non-Deterministic Kaggle Paths: Different notebooks mounted datasets at different nested levels. After five failed pushes, a subagent added diagnostic
os.walk()logic to make path detection robust.
The Results (Shipped on Kaggle)
- Notebook 00 — ISLR EDA: Proves that published ASL accuracy is often inflated by identity leakage. Honest signer-holdout accuracy is ~half of what's usually reported.
- Notebook 01 — Hand-Shape Baseline: MLP (31.5%) vs. Temporal 1D-Conv (36.4%). The 4.9 pp gap confirms that hand-shape priors dominate for isolated signs.
Lessons Learned
- What Worked: Fresh context prevents "hallucination drift." Plans written like spec docs (not TODOs) mean subagents don't have to "invent" logic.
- What I'd Change: I was too granular on notebook sections—one subagent could have handled 10 boilerplate cells. I also need a visual dashboard; tracking 30 tasks via
TodoWritegot chaotic.
The "Why"
Current ASL AI claims ~83% accuracy, but honest evaluation shows ~36%. That 47-point gap is what happens when these products hit the real world. My goal is to publish the honest numbers to build a foundation for Phase 4: a custom, deaf-community co-designed dataset.
Happy to answer questions about the Claude Code workflow, subagent prompts, or the ML side!