u/Deep-Palpitation8315

Almost everyone building with AI today is using some kind of coding agent (Cursor, Claude Code, OpenAI Codex, OpenCode etc.) or a mix of them depending on the workflow.

The problem is that most of these agents still feel like black boxes.

You can see the output, but it’s hard to understand what’s actually happening underneath:

How many tokens are being used?
Which tools are being called?
Where is the agent spending time?
What’s actually driving the cost and behavior?

And every agent handles things differently under the hood.

As builders ourselves, we noticed that we kept switching between different coding agents depending on the task. We also think that’s going to continue — there probably won’t be one “winning” agent for every workflow.

So we built SuperBased Observer.

The idea was simple: give developers a clear view into what their coding agents are actually doing — both from a bird’s-eye view and at a granular level.

SuperBased Observer helps visualize:

token usage
tool calls
costs
model behavior
agent activity over time

Sharing a few screenshots below of what the experience looks like when you install and use it.

Would genuinely love feedback from other builders working with AI agents every day.

npm:

https://www.npmjs.com/package/@superbased/observer

github repo:

https://github.com/marmutapp/superbased-observer

List of Sessions Across Agents & the Usage Levels

Single Session Tool & Token Usage Drill-down

Usage Across Models

Cost Dashboard

Just compared token usage between GPT-5.4 and GPT-5.5 in Codex across all four reasoning modes (Low, Medium, High, and XHigh) using the exact same prompt and the same project as the baseline

Built SuperBased Observer - Claude Code, OpenAI Codex, CopIlot Usage Metrics