

Just compared token usage between GPT-5.4 and GPT-5.5 in Codex across all four reasoning modes (Low, Medium, High, and XHigh) using the exact same prompt and the same project as the baseline
Takeaways:
- GPT-5.5 scales token usage, turns, and cost much more aggressively at higher reasoning modes.
- GPT-5.4 remains relatively cost-efficient, even in XHigh.
- GPT-5.5 appears to spend significantly more tokens on iterative reasoning and context revisiting - basically seems to read more.
- GPT-5.4 feels more compressed (basically read less in our example - fewer docs) and execution-oriented by comparison.
One of the more interesting deltas:
GPT-5.5 XHigh
→ 456.6k input tokens
→ 40 turns
→ $2.58
GPT-5.4 XHigh
→ 296.6k input tokens
→ 30 turns
→ $0.84