▲ 2 r/codex

Are Codex automations cached?

I’m trying to understand how usage is counted for Codex automations.

If an automation runs against a similar repo/context each time, does Codex benefit from cached input tokens the same way normal Codex usage appears to?

Or... is each automation run effectively priced as a fresh task?

I’ve seen Codex pricing reference input, cached input, and output tokens, but I’m not clear whether automations specifically reuse cache across runs.

reddit.com
u/thehashimwarren — 6 hours ago

OpenClaw and Spotify dinged for AI slop

Neetcode pulls together two stories that happened this week.

One, Antrhopic interviewed an engineering leader at Spotify about how much they use AI for coding. But some users have argued that the Spotify app is poor quality.

Second, it's Peter from OpenClaw releasing an app that is poorly made, and getting user backlash.

youtu.be
u/thehashimwarren — 21 hours ago
▲ 6 r/codex

Those generic skills you installed don't work as well on Codex

Most skills you can install from skills repos were tested with Claude models, and Claude Code harness.

That means they don't work as well as they could on GPT models and Codex because they ignore unique features

For example, Codex has access to a great image model, Claude doesn't. So if you use the popular frontend skill that was originally made for Claude, you'll miss out on having Codex make mocks using images rather than code.

Codex can also create images for the final design, while Claude Code would have to find stock for you, or skip using images altogether

Another unique Codex feature is background computer use. A Claude focused skill may avoid incoming computer use because it takes over your apps. But with Codex you can write a skill that use your browser and apps while you still actively work.

Those are just two examples. There are dozens of other overlapping strengths, quirks, features, and oddities that seperate Codex and Claude and demand a different approach to skills.

reddit.com
u/thehashimwarren — 1 day ago

How to write great agent skills

This is one of the most useful AI related talks I've heard in a while. Each minute is packed with a hard win insight on how to write agent skills.

My two biggest takeaways:

  1. Matt makes the case that skills should be human involved not model invoked. It's better for the user to take on the cognitive load of knowing what skills are available and how to use them, rather than the model.

That's because the model will often get confused by too many skills or overlapping skills, and just choose to skip using a skill.

  1. Matt also is a strong proponent of using leading words or meaning packed jargon that helps the model street itself. He says you know your leading word is working if the model repeats it back to itself in its reasoning.

One thing missing from the talk is how and when to use scripts in your skills.

I would also have liked Matt to address how universal a skill is, or does he think skills need to target a model and harness.

youtu.be
u/thehashimwarren — 1 day ago

What is deepsec, Vercel’s security harness?

News dropped today that a new Chinese AI model from Zhipu AI matches Claude Mythos at finding security bugs.

My question is...what sense does it now make to hold back American models like Claude Mythos, GPT-5.6, and Claude Fable. 

Guillermo Rauch tweeted that companies must harden systems right NOW. Then he promoted Vercel’s deepsec:

https://vercel.com/blog/introducing-deepsec-find-and-fix-vulnerabilities-in-your-code-base

Deepsec is an open-source security harness powered by coding agents. It runs on your infrastructure, uses your model/API keys, and directs agents at your codebase to expose hard-to-find vulnerabilities.

I’m not even a tech company, and I plan to run this on my side projects.

u/thehashimwarren — 3 days ago
▲ 204 r/codex

This changed my life

My first inclination with any task now to ask Codex to control my apps and do it for me. Codex has become my new browser.

u/thehashimwarren — 5 days ago

"both the number and share of solopreneurs reaching meaningful income thresholds is rising. AI is filling the capability gaps that once made hiring necessary" (Stripe)

AI is exploding the number of solo business owners reaching meaningful sales numbers, says Stripe.

This part is remarkable:

"We find that there has been a substantial increase in the number of solopreneurs earning over $100,000 in our index, but an even larger increase in the number earning at higher income thresholds, with a clear acceleration since 2023. More than twice as many solopreneurs earned over $1 million in 2025 than in 2023, and close to three times as many crossed $5 million and $10 million.

Perhaps even more interestingly, the share of solopreneurs earning above these income thresholds has also doubled in the last two years, suggesting that—rather than the surge in business applications reflecting low-quality experimentation with a few lucky standouts— the cohorts of new solopreneur businesses might actually be of higher quality than in the past."

stripeeconomics.com
u/thehashimwarren — 6 days ago

Companies should use skills leaderboards instead of token leaderboards

This is a fantastic idea from Guinness Chen.

Companies are incentivizing employees to tokenmaxx as a proxy for being productive with AI.

But in my experience working on a team where we collaborate with agents, the person who makes a skill that others use is the 💎 player.

For example, I use an internal agent to write blog posts. Then I hand edit the copy to include our style guide. Someone on the team, I'm not sure who, added a skill so the agent follows the house style.

That person should get some type of credit for helping me and others save time!

u/thehashimwarren — 8 days ago

Qwen 3.7 Max gets ranked best AI model for front-end design

I love the test to build a Figma homepage clone. Qwen really kills it on that one.

But I wish Steve also tested a more bread and butter design task, like the homepage of a common app.

youtube.com
u/thehashimwarren — 9 days ago

GitHub Copilot makes Impeccable a built-in skill

Impeccable is the most impressive skill I've ever used. It walks me through a design workflow, and makes Codex 100x more useful for front-end design.

​

Impeccable also opened my mind to how products will be built in the future. I think it will be 95% agent skill, and your agent harness does the rest.

​

Smart move by GitHub Copilot team making Impeccable a built in skill.

u/thehashimwarren — 9 days ago

Investors are not happy about Google losing top AI talent

Alphabet stock fell as much as 7.2% after Google DeepMind VP John Jumper became its second top AI exec to leave in a week.

​

In addition to talent leaving I think investors are looking at Google's AI products, particularly coding, and they're not happy with the models lagging behind GLM-5.2.

u/thehashimwarren — 9 days ago

Vercel CEO shocked by GLM-5.2

I wish Rauch would give us examples instead of just a reaction tweet. But this did tip me over into deciding to give GLM-5.2 a try.

​

Have you used it for coding? Do you plan to?

u/thehashimwarren — 10 days ago

According to leaked financials, OpenAI spent $6 billion on marketing. You've been influenced, even if you don't know it.

u/thehashimwarren — 11 days ago

"7,000 Langflow servers are under attack" (VentureBeat)

"Check Point Research chained a SQL injection in LangGraph’s SQLite checkpointer to full remote code execution. Tenable and VulnCheck tracked a path traversal in Langflow’s file upload endpoint to active, in-the-wild RCE. Cyera documented a path traversal in LangChain-core’s prompt loader that reads your secrets off disk. Two paths to a shell, one to your keys."

venturebeat.com
u/thehashimwarren — 11 days ago
▲ 2 r/codex

Do you use Codex's custom subagents? What for?

I've been defaulting to using skills for specialized work. But I'm intrigued by Codex's feature that let's you define custom subagents with a model, reasoning level, sandbox access, and set of tools.

In the docs they have examples, such as a reviewer or pr_explorer subagent.

My fear though is unintentionally handicapping an agent that needs the highest reasoning or a certain tool for a job. That's why skills seems like a better way to go.

Any of you using custom subagents? How and what for?

reddit.com
u/thehashimwarren — 12 days ago

Have Codex write goals and subagents for tasks

I saw Pietro Schirano share this on Twitter. You can add this to any prompt and Codex will design its own goals and use subagents to get the work done.

For this task, write yourself a new goal and spawn agents in parallel — as many as needed to do it better and faster. Split the work into independent pieces, dispatch them concurrently, and synthesize the results as they return. Give each agent its own dedicated /goal.
reddit.com
u/thehashimwarren — 16 days ago

Agent evals - build or buy?

I was watching a great interview with Hamel Husain & Shreya Shankar about LLM evals. They gave some advice to just spin up your own eval system tailored to your needs.

But I also see some startups with output scoring and notes products that seem flexible.

And some agent frameworks have built in eval systems.

Which type of eval platform do you use? Custom, standalone, or part of a framework?

reddit.com
u/thehashimwarren — 1 month ago
▲ 203 r/codex

I only trust agent benchmarks that confirm my bias that Codex > Claude

The team at Datacurve released a new coding agent benchmark, DeepSWE. Supposedly it is better than SWE-bench because the tasks haven't been seen before, and require long runs to complete.

The only thing I care about is confirmation that I made the write choice by going deep with Codex

u/thehashimwarren — 1 month ago

"Agents need someone who cares about them"

Dan Shipper, the CEO of Every was on Lenny's Podcast this week explaining why he's shifted from "every employee will have an agent" to "companies will have one universal agent that is managed by one person".

>"In order for an AI agent to be useful right now, it really needs a human who cares about it. It really needs a human personal connection with someone who's watching what it does and making sure that it's doing the right thing and that it's useful for people."

I think Dan is right. It's a much better user experience to use an agent at work that someone else has already configured for you, but is still flexible enough for your own work style.

u/thehashimwarren — 1 month ago

When does Airmega Mighty2 typically go on sale?

I'm interested in the Airmega Mighty2, but it's too pricey, and I'm patient.

Is it usually on sale on Prime Day? Black Friday?

Or do they usually not discount it?

reddit.com
u/thehashimwarren — 1 month ago