r/codex

New 5 Hour limit is a mess!!!
🔥 Hot ▲ 97 r/codex

New 5 Hour limit is a mess!!!

So after many days I decided to give a test to codex. usually these are the tasks i give it to the agent:
Code refractoring
UI UX playwright tests
Edge case conditions

From the past 1 week I was messing with GLM-5.1 and to be honest I pretty much liked it.
Today I came back to codex to see how hard the new limits have been toned downed to and behold I hit the limit in 45 minutes approx.

My weekly limit ironically seems to have improved. Previously for a same 5 hour session consumption I was accustomed to losing about 27-30% of the weekly limit. But in the new reset I was able to consume 100% of the 5 hour session while only LOSING ABOUT 25% TOTAL.(A win I guess).
While they drstically tuned down one thing they seem to have improved the other by a margin!!

Hoping they fix this soon.

u/Impossible-Ad-8162 — 9 hours ago
The 6 Codex CLI workflows everyone's using right now (and what makes each one unique)
🔥 Hot ▲ 237 r/OpenAI+1 crossposts

The 6 Codex CLI workflows everyone's using right now (and what makes each one unique)

Compiled a comparison of the top community-driven development workflows for Codex CLI, ranked by GitHub stars.

▎ Full comparison is from codex-cli-best-practice.

u/shanraisshan — 20 hours ago
▲ 46 r/codex

Usage ran out too fast

It looks like GPT may be using ideas from Claude’s leaked code. Now we’re seeing the 5-hour usage limit get burned up by a single message in less than an hour.

Has anyone else noticed this?

reddit.com
u/Aromatic_Cry_6252 — 13 hours ago
▲ 18 r/codex

Out of limit too fast ? Use this.

In config.toml :

model_context_window = 220000

model_auto_compact_token_limit = 200000

[features]

multi_agent = false

This new 1 000 000 size context and multi agent just burn your plan. Learn again to deal whitout them. 👌

reddit.com
u/Tikilou — 7 hours ago
5.4-mini-high vs 5.4-low (tokens, performace, stabillity)
▲ 17 r/codex

5.4-mini-high vs 5.4-low (tokens, performace, stabillity)

Here is what i got using GPT-pro extended when asking about using 5.4 vs 5.4-mini to optimize for 5h limits. Feel free to call this ai slop because it's literally a copy-paste:

"My read from the current official material is: GPT-5.4-mini can get surprisingly close to full GPT-5.4 on some coding-style evals, but it is not a blanket substitute. On the published xhigh benchmarks, GPT-5.4-mini is only 3.3 points behind GPT-5.4 on SWE-Bench Pro (54.4% vs 57.7%) and 2.9 points behind on OSWorld-Verified (72.1% vs 75.0%), but the gap is much larger on Terminal-Bench 2.0 (60.0% vs 75.1%) and Toolathlon (42.9% vs 54.6%). OpenAI still positions gpt-5.4 as the default for most important coding work and gpt-5.4-mini as the faster, cheaper option for lighter coding tasks and subagents. (OpenAI)

So to your direct question — can 5.4-mini high perform as well as 5.4-low? On some bounded, explicit, test-backed coding tasks, probably yes. As a general routing rule, I would not assume equivalence. I did not find a public official matrix that directly compares full 5.4 at low against mini at high; the public release material shows xhigh snapshots and says reasoning efforts were swept from low to xhigh, but it does not publish the cross-effort table. The current prompt guidance also says gpt-5.4-mini is more literal and weaker on implicit workflows and ambiguity handling, which is exactly where “maybe mini-high is enough” stops being safe. (OpenAI)

The biggest developer-side insight is that high should not be your default. In the current GPT-5.4 docs, newer GPT-5 models default to none; the reasoning guide says low is for a small reliability bump, medium/high are for planning, coding, synthesis, and harder reasoning, and xhigh should be used only when your evals show the extra latency and cost are justified. The GPT-5.4 prompt guide also explicitly says higher effort is not always better, and that you should often improve completion rules, verification loops, and tool-persistence rules before raising reasoning effort. (OpenAI Platform)

The safest way to think about “hardness” is on three axes rather than one: ambiguity, horizon, and working-set size. Ambiguity: OpenAI says mini is more literal and weaker on implicit workflows. Horizon: full 5.4 keeps a much larger lead on terminal/tool-heavy evals than on SWE-style bugfix evals. Working-set size: full 5.4 has a 1.05M context window versus 400K for mini, and mini’s documented long-context scores drop sharply once the eval moves into the 64K–256K range — for example MRCR v2 is 86.0% vs 47.7% at 64K–128K and 79.3% vs 33.6% at 128K–256K. So once the task needs a big repo slice, many files, or lots of docs/logs in play, mini stops being the “safe” default even if the raw coding gap looked small. (OpenAI Developers)

My quota-preserving routing rule — this is my synthesis, not an official OpenAI taxonomy — would be: use 5.4-mini at none/low for reconnaissance, repo search, code explanation, mechanical edits, and bugfixes with a clear repro or failing test; use 5.4-mini at medium/high for bounded multi-file work with explicit specs or strong acceptance tests; escalate to 5.4 at low when ambiguity, tool/terminal horizon, or working-set size gets high; escalate to 5.4 at medium/high for production migrations, security/auth/concurrency work, sparse-test repos, or after a lower-effort pass misses; and reserve xhigh for the cases where you have evidence it helps. (OpenAI Developers)

On raw token cost, mini has a very large structural edge. GPT-5.4 is $2.50 / $0.25 cached / $15.00 per 1M input / cached / output tokens, while GPT-5.4-mini is $0.75 / $0.075 cached / $4.50 — basically 3.33x cheaper across all three billed token categories. Reasoning tokens are tracked inside output/completion usage and count toward billing and usage, so high/xhigh costs more mainly because it generates more billable output/reasoning tokens, not because reasoning effort has its own separate surcharge. Rule of thumb: mini-high can still be cheaper than full-low unless it expands billable tokens by roughly more than that 3.3x price advantage. (OpenAI Developers)

For a representative medium-heavy coding turn, if you send about 60k fresh input tokens and get 15k output tokens back, the API cost is about $0.375 on GPT-5.4 versus $0.1125 on GPT-5.4-mini. For a later iterative turn with about 60k cached input, 15k fresh input, and 6k output, it comes out to about $0.1425 on GPT-5.4 versus $0.0428 on mini. Those mixes are just examples, not official medians, but the stable part is the roughly 3.33x raw price gap. (OpenAI Developers)

If your main problem is the Codex 5-hour limit rather than API dollars, the current Codex pricing page points in the same direction. On Pro, the documented local-message range is 223–1120 per 5h for GPT-5.4 versus 743–3733 per 5h for GPT-5.4-mini; on Plus, it is 33–168 versus 110–560. OpenAI also says switching to mini for routine tasks should extend local-message limits by roughly 2.5x to 3.3x, and the mini launch post says Codex mini uses only about 30% of GPT-5.4 quota. The docs also note that larger codebases, long-running tasks, extended sessions, and speed configurations burn allowance faster; /status and the Codex usage dashboard show what you have left. (OpenAI Developers)

The highest-leverage protocol for “hours of work without tanking the 5h window” is a planner/executor split: let full 5.4 handle planning, coordination, and final judgment, and let mini handle narrower subtasks. Beyond model choice, OpenAI’s own tips are to keep prompts lean, shrink AGENTS.md, disable unneeded MCP servers, and avoid fast/speed modes unless you really need them, because those increase usage and fast mode consumes 2x credits. If you are driving this through the API, use the Responses API with previous_response_id, prompt caching, compaction, and lower verbosity when possible; the docs say this improves cache hit rates, reduces re-reasoning, and helps control cost and latency as sessions grow. One subtle point: the published 24h extended prompt-cache list includes gpt-5.4, but I did not see gpt-5.4-mini listed there, so for very long iterative sessions with a huge stable prefix, full 5.4 has a documented caching advantage. (OpenAI)

A conservative default would be: mini-low first, mini-high second, full-low for anything ambiguous or repo-wide, full-high only when the task is both important and clearly hard."

u/v1kstrand — 7 hours ago
▲ 2 r/codex

How are you actually running Codex at scale? Worktrees are theoretically perfect and practically painful. What's your setup?

Been running 4 to 6 Codex agents concurrently and I still haven't found a clean architecture. Wanted to ask how others are doing it.

The worktree trap

Worktrees sound ideal. Each agent gets isolation, you're not stomping on each other. But in practice:

Dependencies are missing unless you actively set them up. You have to maintain a mental map of what's merged to main and what isn't. You spot a bug running your main branch product but is that bug also present in the worktrees? Who knows. You spot a bug inside a worktree (for example testing a Telegram bot there) and now you can't branch off main, you have to branch from that worktree, which means that fix has to get merged back through an extra hop before it reaches main.

Scale this to 6 agents and the coordination overhead alone starts eating your throughput. I have a main branch and a consumer branch, so some PRs go to main, some to consumer and now it gets genuinely messy.

What I've tried

One orchestrator agent running in a tmux session, inside a worktree. It spawns sub agents into new tmux panes via the CLI, sometimes giving them their own worktrees, sometimes running them in the same one.

Promising in theory. Annoying in practice.

Where I'm converging

One integrator agent in a single worktree. All sub agents it spawns run inside that same worktree. One level of isolation. Ship PRs directly from there to main or consumer. No nested worktree graph to untangle.

Saw Peter Steinberger mention he doesn't use worktrees at all and I'm starting to understand why. With one worktree you get clarity. With six, you spend half your mental cycles just keeping the map in your head and the whole point of running agents is to offload cognitive load, not add it.

The session length problem

Something else I've been wondering about. When Codex finds a bug and fixes it, then immediately surfaces another issue, do you keep going in that same session or do you spin up a fresh one?

My experience is that the longer a session runs the worse the output gets. Context bloat makes the model noticeably slower and dumber. What should be a quick precise fix turns into the agent going in circles or making weird choices. At some point the session just becomes unusable.

So the question becomes: one long session per task, or short focused sessions per bug, even if that means more context setup overhead? And does your answer change depending on whether you're using worktrees or not?

What's your setup?

How are you running multi agent Codex in practice? Pure main branch, worktrees, tmux orchestration, something else entirely? Especially curious if anyone's found a clean solution for concurrent agents plus multiple target branches plus keeping sessions tight enough to stay useful.

reddit.com
u/artemgetman — 3 hours ago
Codex [GPT 5.4] outputs read like a dev buddy talking you through the build
▲ 14 r/codex

Codex [GPT 5.4] outputs read like a dev buddy talking you through the build

Look at this screenshot. Read the language coming out of Codex right now.

https://preview.redd.it/pip7zh1h71tg1.png?width=1568&format=png&auto=webp&s=880e7a2f48f2f2d158026ea05dfc34cc0c9d02c3

This isn't robotic log output. This is a colleague narrating their thought process while they work. It explains what it's doing, why it's waiting, what it validated, and what's next. There's cadence here. There's personality.

And that got me thinking — what if we could convert these outputs into audio format? You're vibecoding, Codex is working in the background, and instead of switching tabs to read status updates, you just hear it. Two voices. One is you (your prompts, your intent), the other is Codex (its reasoning, its decisions, its progress).

Or you're in the kitchen making coffee while a build runs. Instead of walking back to check the screen, Codex just tells you what's happening. Like a pair programmer who doesn't need eye contact.

The writing quality is already there. The narrative structure is already there. Someone just needs to build the bridge between these outputs and a TTS pipeline with two distinct voices.

Vibecoding is about staying in the zone. Reading walls of logs pulls you out. Listening keeps you in.

Where would you listen to your Codex sessions if they were audio? What's the one moment in your workflow where hearing this instead of reading it would change everything?

reddit.com
u/karmendra_choudhary — 19 hours ago
▲ 14 r/codex

Pro Plan Limit

I’ve been working with Codex for months now, and with the Pro plan I never had any problems. I’ve seen several people here say they were hitting the limit very quickly, but since the last reset, with the exact same workflow, mine drops so fast that I can’t effectively work with it anymore. Was something changed massively in the last 1–2 days....?

reddit.com
u/yippie_kiiyay — 21 hours ago
▲ 6 r/codex

What’s your codex setup for working with external APIs?

Curious what everyone’s workflow looks like, every time I integrate something like stripe or supabase codex uses outdated methods and I end up debugging runtime errors for stuff compiled fine.

How are you feeding it current docs? Pasting into AGENTS.md? Skills? Something else?

reddit.com
u/thecontentengineer — 13 hours ago
▲ 1 r/codex

Where to start the working thread on multi repo projects?

where/how do you initialize your thread on projects with a multi repo layout if you need changes in multiple repos?

code/imagestore
code/backend
code/frontend

you work on an issue located in frontend but this also needs backend changes, where do you start the thread?

the logical assumption would be to start it at the code folder but then viewing diffs relies solely on an IDE instead of looking at codex.

Any experience?

reddit.com
u/AdTop6345 — 7 hours ago
Codex stalls after a few iterations and i mean it
▲ 1 r/AIAssisted+1 crossposts

Codex stalls after a few iterations and i mean it

After ~2–3 iterations, Codex starts looping for me.

I point out issues, give clear examples, it agrees… but then just circles back with minor tweaks. No real improvement.

If I take the same prompt to Claude or Gemini — boom, it fixes things almost immediately.

Feels like Codex is great for initial architecture / backend setup, but struggles after a few refinement rounds.

Curious — at what point do you guys bring in another model? I feel like I am wasting a lot of time stuck in these iteration loops.

https://preview.redd.it/m776ncxbwzsg1.png?width=2396&format=png&auto=webp&s=02879103829b6e8f32b6f708107090edcf665d2f

reddit.com
u/Beginning_Handle7069 — 24 hours ago
▲ 0 r/codex

Codex App vs CLI: What’s your real workflow?

I mostly work with Codex, mainly the app, not the CLI. I have also used Gemini, Claude Code, Cursor, and others, but I still really like Codex.

Lately, though, I keep seeing a lot of tools, wrappers, and setups built around the CLI, like custom skills, configs, and things such as Oh My Codex. So I am wondering what your actual workflow looks like.

Do you prefer the CLI over the Codex app? And can those CLI focused setups also be adapted for the Codex app, or are they really only worth it if you work in the CLI directly?

I am a full stack developer, and one problem I keep running into is that every time I start a new project, I spend a lot of time setting up agents, config files, and docs before things feel smooth. That makes me think maybe there is a better general Codex setup I am missing.

Curious how you use Codex in practice and whether you get more out of the CLI or the app.

reddit.com
u/LaFllamme — 6 hours ago
▲ 0 r/codex

GPT5.4 ---> dumber of late?

Anecdotal, but I used to run Sonnet 4.6 and GPT 5.4 neck on neck and they both did great jobs.

Last few weeks GPT 5.4 has become consistently dumber, forgetting things it didn't used to, making the same mistake over and over.

Anyone experiencing similar things?

reddit.com
u/oulu2006 — 5 hours ago
Week