u/AdditionalWeb107

Signals: finding the most informative agent traces without LLM judges [R]

Signals: finding the most informative agent traces without LLM judges [R]

Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company).

Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far too many agent traces/trajectories to review one by one, and using humans or extra LLM calls to inspect all of them gets expensive really fast. The paper proposes a lightweight way to compute structured “signals” from live agent interactions so you can surface the trajectories most worth looking at, without changing the agent’s online behavior. Computing Signals doesn't require a GPU.

Signals are grouped into a simple taxonomy across interaction, execution, and environment patterns, including things like misalignment, stagnation, disengagement, failure, looping, and exhaustion. In an annotation study on τ-bench, signal-based sampling reached an 82% informativeness rate versus 54% for random sampling, which translated to a 1.52x efficiency gain per informative trajectory.

Paper: arXiv 2604.00356. https://arxiv.org/abs/2604.00356
Project where Signals are already implemented: https://github.com/katanemo/plano

Happy to answer questions on the taxonomy, implementation details, or where this breaks down.

u/AdditionalWeb107 — 5 days ago
▲ 0 r/ollama

Signals V2: an LLM-free analyzer to scores live agent trajectories as OpenTelemetry spans.

Today, we're shipping Signals v2 in Plano - a model-free behavioral analyzer that scores live agent trajectories and attaches the results as structured, heuristically defined attributes on existing OpenTelemetry spans. No extra model calls, no new infra - just pure signals.

Trajectories in agentic interactions are long, numerous and non-deterministic - you can't hand review them all. Running an LLM-as-a-judge burns tokens without telling you which traces actually deserve attention.

Signals follow the taxonomy of our published research "Signals: Trajectory Sampling and Triage for Agentic Interactions" (Chen et al., 2026). We loop in the interaction, execution and environment covering 20 leaf detectors for misalignment, looping, escalation, tool failures and context exhaustion.

Plano Signals is available today, and it's automatically bundled with traffic proxied with Plano. Get started with the Plano CLI with the link above!

Documentation: https://docs.planoai.dev/concepts/signals.html
Research Paper: https://arxiv.org/abs/2604.00356

u/AdditionalWeb107 — 6 days ago

Hey peeps - I think the hardest thing about building agents is their evaluations. especially for scenarios that require multiple tool calls and the agent itself can go down a trajectory that you haven't manually tested before. And trajectories are voluminous and non-deterministic, and reviewing each one, whether through human review or auxiliary LLMs, is slow and cost-prohibitive.

So I built a signal-based framework for triaging agentic interaction trajectories. My approach computes cheap, broadly applicable signals from live interactions and attaches them as structured attributes for trajectory triage using OTEL attributes

I organize signals into a coarse-grained taxonomy spanning interaction (misalignment, stagnation, disengagement, satisfaction), execution (failure, loop), and environment (exhaustion), designed for computation without model calls.

In a controlled annotation study on τ-bench, a widely used benchmark for tool-augmented agent evaluation, we can show that signal-based sampling achieves an 82% informativeness rate compared to 74% for heuristic filtering and 54% for random sampling, with a 1.52x efficiency gain per informative trajectory.

The advantage is robust across reward strata and task domains, confirming that signals provide genuine per-trajectory informativeness gains rather than merely oversampling obvious failures. These results show that lightweight signals can serve as practical sampling infrastructure for agentic systems, and suggest a path toward preference data construction and post-deployment optimization.

Links to the approach and the project where this is implemented below

reddit.com
u/AdditionalWeb107 — 12 days ago

Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company).

Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far too many agent traces/trajectories to review one by one, and using humans or extra LLM calls to inspect all of them gets expensive really fast. The paper proposes a lightweight way to compute structured “signals” from live agent interactions so you can surface the trajectories most worth looking at, without changing the agent’s online behavior. Computing Signals doesn't require a GPU.

Signals are grouped into a simple taxonomy across interaction, execution, and environment patterns, including things like misalignment, stagnation, disengagement, failure, looping, and exhaustion. In an annotation study on τ-bench, signal-based sampling reached an 82% informativeness rate versus 54% for random sampling, which translated to a 1.52x efficiency gain per informative trajectory.

Links in the comments below

reddit.com
u/AdditionalWeb107 — 14 days ago
▲ 20 r/Anthropic+3 crossposts

Hey peeps - just shipped Plano 0.4.22 with support for a local TUI so that you could view costs, requests by model and inspect adaptive routing support based on a policy-based router as described in this paper: https://arxiv.org/abs/2506.16655.

Of course Ollama-based models are supported out of the model. W

Hope you enjoy the release.

u/AdditionalWeb107 — 12 days ago