r/WOZCODE

WOZCODE just showed up on terminal-bench 2.0 on hugging face
▲ 7 r/WOZCODE+2 crossposts

WOZCODE just showed up on terminal-bench 2.0 on hugging face

Our newest Terminal-Bench 2.0 submission, powered by Claude Opus 4.7, reached 80.2% accuracy across 89 tasks with 5 attempts per task (445 total trials), and the run has passed validation.

We view this result as meaningful for three reasons:

  1. Evaluation depth: the score reflects repeated performance across a broad task set, not a single-pass run.
  2. Execution realism: Terminal-Bench tests agents in terminal-based workflows where success depends on tool use, state management, multi-step reasoning, and reliable completion under realistic constraints.
  3. Validation rigor: passing validation matters because reproducibility and benchmark integrity are critical when evaluating agent systems.

As the space matures, we believe the most important progress will come from systems that are not only capable, but also consistent and dependable in real operating environments. This result is a strong step in that direction for WOZCODE.

Submission details:
https://huggingface.co/datasets/harborframework/terminal-bench-2-leaderboard/discussions/148

u/ChampionshipNo2815 — 1 day ago
▲ 5 r/WOZCODE+1 crossposts

You’re probably overpaying for Claude Code.

https://reddit.com/link/1so881w/video/688b672vfsvg1/player

Last month we realized we’d spent over $100k on Claude Code.

The number was bad enough, but the more frustrating part was seeing how much of that was just wasted on useless tokens.

So we built something to fix it.

It’s called WOZCODE. It plugs into Claude Code, works with the setup we were already using, and doesn’t require switching tools or doing anything weird.

It takes like 2 commands to install, uses your existing Claude subscription, and works across CLI, Claude Desktop, VS Code, Conductor, and the rest.

After using it, we saw:

  • 25 to 55% lower cost
  • 30 to 40% faster performance
  • about 20% better results on agent benchmarks

Posting in case this is useful to anyone else dealing with the same problem.

WOZCODE is free to install.

reddit.com
u/ChampionshipNo2815 — 4 days ago
▲ 2 r/WOZCODE+1 crossposts

How are you all keeping SaaS costs under control without slowing the team down?

I keep seeing the same problem come up for growing SaaS products. Costs rarely explode from one thing. It is usually a slow pileup across infra, APIs, AI tools, observability, storage, and all the extra software teams add along the way.

The obvious advice is always the same stuff about turning off unused resources or renegotiating vendors, but that feels like the easy part. The harder part is reducing spend without slowing the team down or making devs fight the stack every day.

Lately I’ve been paying more attention to where teams actually lose money, and it seems like a lot of it comes from workflow debt. Things that made sense when moving fast early on, but quietly became expensive once usage grew.

For people here who are a bit further along, what ended up making the biggest difference for you? Better infra discipline, fewer tools, stricter ownership, pricing changes, or something else?

reddit.com
u/ChampionshipNo2815 — 6 days ago
▲ 4 r/WOZCODE+1 crossposts

I benchmarked two coding agents (same model) and one is ~10x cheaper for the same output

Been running nightly CI tests comparing two agents on the same prompts/repos. Both use the same model (opus), so this isn’t a model diff.

The results are kinda insane.

For a simple color scheme change:

•	Agent A: 44 tool calls, \~$2.42

•	Agent B: 3 tool calls, \~$0.22

Outputs were basically identical.

The main difference isn’t intelligence, it’s how they use tools:

•	One batches everything into a single edit (multi-file, multi-change)

•	The other does one edit per change… over and over

So you end up with 10–40 tool calls for stuff that could’ve been done in one.

Also noticed:

•	One agent almost always does Read → Edit (extra turn every time)

•	The other just edits directly most of the time and it… works

No real drop in correctness so far.

There are some tradeoffs:

•	The faster one sometimes guesses file paths wrong at the start and wastes a search

•	The slower one is more “proper” (reads files, smaller edits, etc.)

But overall:

•	cheaper

•	fewer turns

•	faster

•	same result

Feels like most of the gains in coding agents right now aren’t coming from better models, but from better execution strategies (batching, skipping reads, etc.)

Curious if others have tested this or seen similar behavior.

reddit.com
u/ChampionshipNo2815 — 7 days ago

How are people structuring Claude Code sessions on larger repos?

Been using Claude Code on a bigger React and TypeScript repo lately, and I’m realizing the biggest difference is not really prompt quality, it’s task shape. On small projects I can throw a broad request at it and usually get something decent back. On a larger codebase, I get much better results when I break work into tighter chunks and keep each session focused on one area.

What’s been working best for me so far is starting with one concrete target, like a single feature flow or one refactor boundary, then only widening scope after that is stable. I’ve also been using Claude Code with Wozcode on this repo, and the smoother sessions for me are the ones where I’m very explicit about which files matter and what done actually means before it starts making changes.

I’m not really talking about rate limits or model quality here. More the day to day workflow side. Things like whether you keep one long running session, restart often, split planning from implementation, or use different habits depending on repo size.

Interested how other people are handling this on real projects. Are you treating Claude Code more like a pair programmer in one lane at a time, or giving it broader tasks and letting it roam a bit more?

reddit.com
u/ChampionshipNo2815 — 6 days ago

👋 Welcome to r/WOZCODE - Introduce Yourself and Read First!

Welcome to WozCode

Hello everyone,

I’m one of the moderators here at WozCode. Welcome to the community, we’re glad to have you.

WozCode is a space focused on learning, building, and collaboration in software development. Whether you’re a beginner or an experienced developer, the goal is to create an environment where members can grow through meaningful discussion and real-world projects.

Community Guidelines

To maintain a high-quality and productive space, please follow these rules:

  1. Be respectful and professional Treat all members with respect. Harassment, discrimination, or toxic behavior will not be tolerated.
  2. Stay on topic Ensure posts are relevant to coding, software development, or closely related technical areas.
  3. Prioritize value Share insights, resources, or projects that contribute meaningfully to the community. Low-effort or spam content may be removed.
  4. Encourage learning and collaboration Ask questions, provide constructive feedback, and support others in their learning journey.
  5. No self-promotion without contribution Promotional content is allowed only if you are actively contributing to the community in other ways.
  6. Follow Reddit’s site-wide policies All standard Reddit rules apply in addition to the above.

What to Expect

• Project showcases and feedback
• Technical discussions and problem-solving
• Learning resources and guidance
• Opportunities to collaborate

If you have suggestions, concerns, or need assistance, feel free to reach out to the moderation team.

We look forward to building a strong and valuable community together.

— WozCode Moderation Team

reddit.com
u/ChampionshipNo2815 — 7 days ago