
u/py-net

Can someone please confirm that this is the latest version of Codex? "Codex 26.513.20950"
I'm trying to connect Codex macOS to ChatGPT iOS on iPhone. It's not working. I asked ChatGPT what's the latest version number of Codex. What it gave me is not exactly this, but the one in the title is what my Codex on macOS is displaying, so I'm a little confused.
Ex OpenAI CTO Mira Murati is giving them a serious fight for the bucks. Her new “Interaction Model” makes “GPT-Realtime-2” look like caveman, current capabilities level wise
thinkingmachines.aiThey added global dictation on Mac, which I love love love ❤️
It came with holding a keyboard shortcut to talk and release to stop. I requested a press key to talk, press again to stop; more suitable for long dictation. The next day it was shipped as a second way to use global dictation. I don’t know if it was my request or someone else’s, but this trend of listening carefully to users and building Codex to serve their needs is gonna get the app to lead the AI race by miles ahead. Kudos to the team at OpenAI
The timeline blew up yesterday when GPT-5.5 dropped on Arena and landed at #9 in Code Arena behind a bunch of Claudes, GLM/Kimi, and even Meta's Muse Spark! People lost it in the comments on X. One dude straight up "fixed it" by posting a fake leaderboard with GPT-5.5 High at #1 😂. People from Arena had to drop clarifications twice for the first time that I know of.
Benchmarks are useful but they don't always match real-world capabilities. From what I have seen and tried myself, GPT-5.5 is a beast for actual coding work.
The backlash on X was loud as hell because people are actually using the model, it feels like magic in comparison to the best competition, yet here it is sitting mid-pack on the leaderboard. That tells you the benchmark is missing a ton of context about what people actually value day-to-day.
Arena's clarifications: first about the reasoning effort levels—medium/high tested, xHigh still cooking; second admitting that Code Arena right now is basically just frontend/web dev/React tasks. That's GPTs known weak spot for a while already. Full-stack app dev and better GitHub integration aren't even in yet—supposedly coming in a couple months.
Still... frontend coding is legitimately an area where OpenAI has been behind the competition for a bit. If the test is focused there and it's "fair" across models, they should probably tighten that up instead of just waiting for the scope to expand. Frontend coding is half of coding.
I'm not dooming on GPT-5.5— it's my main weapon across all my tasks right now. But this whole episode should also be a good reminder for OpenAI that maybe it's time for a frontend CODE RED 🔴
On Arena's side, they know benchmarks lose credibility fast when the rankings feel disconnected from what devs actually experience in the wild. Leaderboards are snapshots, not the full picture. Crowd-sourced Elo from pairwise votes can get weird with naming reveals and preferences.
Real usefulness > cherry-picked arena scores.