u/Extra-Act2560

Claude Code looked busy. The postmortem showed where I should have stopped.
▲ 4 r/claude

Claude Code looked busy. The postmortem showed where I should have stopped.

I ran a Claude Code validation session against my KubeAttention project. The goal was:

>verify whether the scheduler and deployment path are production-ready enough to ship

From the terminal, it looked useful. Claude read docs, ran Go tests, inspected the scheduler module, tried kubectl, and rendered the Helm chart.

But the useful answer had already arrived before the checklist finished: stop this session and fix the blockers. The attached image is the actual cc-blackbox terminal postmortem renderer, including its Claude-analysis section, generated from this session's local JSONL evidence.

  • 15 assistant turns in 31 seconds
  • 321,345 total tokens through the session
  • 299,649 of those were cache reads
  • 13 Bash commands
  • 6 failed commands

The concrete blockers were not subtle:

  • go test ./pkg/scheduler failed because the package path was outside the main module
  • cd pkg/scheduler && go test . failed because the nested module was missing go.sum data
  • go test ./pkg/scheduler/... failed for the same module-boundary reason
  • kubectl checks hit localhost:8080 with no cluster available

That is the pain I am trying to solve. When a Claude Code session is still producing output, I want a local tool that can turn the evidence into a concrete stop/restart decision:

>stop this session, fix the module boundary and go.sum, attach a real cluster, then rerun validation

So I built cc-blackbox: a local flight recorder and guardrail for Claude Code sessions. It runs through a local proxy, watches the stream, labels direct versus heuristic evidence, and produces this kind of redacted postmortem when the session is done.

The important part for me: this postmortem did not need the raw source or a full transcript to show the problem. Command names, exit codes, timestamps, token buckets, and redacted file categories were enough.

Question for heavy Claude Code users:

What should make a local guard tell you to stop early?

https://github.com/softcane/cc-blackbox

u/Extra-Act2560 — 3 days ago
▲ 1 r/codex

The problem I’m trying to debug is simple: sometimes a Codex run finishes, or half-finishes, but I still don’t really know what happened.

  • Was the response actually complete?
  • Which model answered?
  • How many tokens went into input vs cached input vs reasoning/output?
  • Did the run hit context pressure?
  • Was there a model mismatch?
  • Is this worth continuing, or should I restart with a cleaner prompt?

It runs Codex CLI through a local wrapper/proxy and gives a postmortem for the run. The report shows things like:

  • completed/failed / incomplete outcome
  • requested model vs served model
  • input, cached input, uncached input, output, and reasoning tokens
  • local cost estimate
  • high context use/model mismatch/accounting oddities
  • tools the model tried to call

https://github.com/softcane/codex-blackbox

I’m building it for local debugging when I’m using Codex myself and want to know whether a run was actually useful or just seemed that way.

For people using Codex CLI heavily:

  • What would you actually want in a postmortem?
  • More raw evidence?
  • Better cost/token breakdown?
  • Tool-call timeline?
  • “Continue vs restart” recommendation?
  • Something else?
u/Extra-Act2560 — 6 days ago

I’ve been trying to debug a specific Claude Code failure mode:

The session still looks active. It keeps reading files, editing things, calling tools, and producing output. But something feels off. Progress slows down, tokens keep burning, and only later you realize you probably should have stopped or restarted much earlier.

So I built cc-blackbox for my own itch.

https://github.com/softcane/cc-blackbox

It’s a local black box recorder for Claude Code sessions. It runs Claude Code through a local proxy and then gives you a postmortem of what happened.

The goal is to answer questions like:

- When did the session start going sideways?
- Did cache behaviour change?
- Is it looping on the same files or tools?
- Is context pressure getting too high?
- Did the actual model route look wrong?
- Should I keep going, compact, or restart?

I’m not trying to build a manager dashboard or another usage chart. I wanted something more practical for a local developer:

>Is this session still worth continuing, and what is the evidence?

Everything runs locally, and the postmortem is designed around redacted evidence.

I’m looking for feedback from people who use Claude Code a lot:

What signal would make you stop a session early?

Cache churn? Repeated tool calls? Context pressure? Cost per turn? Model mismatch? Something else?

u/Extra-Act2560 — 6 days ago

After opus 4.7 and Claude code recent bugs,
I wrote a stack to observe what my Claude session is doing.

It happened twice this week, Claude code hallucinates a skill name, which was captured by my o11y stack. I end up writing those skill.

My claude code o11y stack
https://github.com/softcane/clauditor

I remember Boris Cherny mentioned building ahead of the model in some talk. You anticipate what model is trying to do and retrofit. So I watch my Claude session carefully specially when it hallucinates.

How you do new skill discoveries?

u/Extra-Act2560 — 13 days ago