u/Cool_Injury4075 — reddlx

So, wiith openai having dropped gpt-5.5 recently, I decided to give it a test. I’ve always tended to be quite skeptical about the “game-changing breakthrough” claims made by openai with each new release. Usually, even under perfect conditions, these models fail quite spectacularly when asked to perform high-level competitive programming.

It’s fair to say that competitive programming is not realistic engineering at all. Sites such as codeforces provide you with a masticated problem statement where all the constraints are laid out beforehand. Engineering, on the other hand, is mostly figuring out what is wrong in the first place, something codeforces doesn't aim to assess, since it focuses more on logic.

However, in my case, I usually favor presenting a model with a masticated problem statement in order to get some ideas how the problem can be solved, optimized or improved, or even to brainstorm the problem itself. And in terms of this kind of use case, CP is a good tool for evaluating pure logic in LLMs, as long as you use problems that aren't widely known.

Surprisingly enough, gpt-5.5 outperformed gemini 3.1 pro significantly in this type of problems. More specifically, it solved rng_58’s “hardest problem”.

Blog: https://codeforces.com/blog/entry/10690

Problem: https://judge.u-aizu.ac.jp/onlinejudge/description.jsp?id=1164

Link to the conversation with chatgpt.