u/UsualOrganization712

Image 1 — Claude vs z.ai! Had z.ai nailed glm 5.1 to on par with Claude models? Price increase justified?
Image 2 — Claude vs z.ai! Had z.ai nailed glm 5.1 to on par with Claude models? Price increase justified?
▲ 21 r/ZaiGLM

Claude vs z.ai! Had z.ai nailed glm 5.1 to on par with Claude models? Price increase justified?

Not a praise post, but outcome from a genuine case study. Something changed with z.ai in their recent glm 5.1 model. I have been glm user since glm 4.5 and got their annual legacy plan during last Black Friday sale. First few weeks till a month after subscription was good and I was getting lot of things done in weeks. Then rush of people(me included as the whole point of buying it was to have a good enough work horse) into the low cost plans lead to huge surge in active users which dramatically lead to lower TPS and throughput of their models. Glm 4.7 came, touted to be as good as Claude models which was not so. Usage spike might have made z.ai to quantize the model quickly and it was felt clearly.

Lost hope in glm models and moved from Claude’s pro to 5x plan. To be honest the difference was day and night with sonnet 4.5 and 4.6, opus 4.5 and early days of opus 4.6. But recently opus 4.6 started to feel like it’s been heavily nerfed, does many illogical mistakes and misses that I used to find while reading through the edits. There use to be a time where I used to give requirements and read only summary at the end of what was done. That did not last long with opus and sonnet. I started spending more time than before. Tasks that was taking days changed to week in defining clear requirements, architecture, updates, establishing checkpoints, validation criteria at each checkpoint after an update or edit, carefully looking into the code to confirm incremental edits were done, corrections were surgical and dependents patched up correctly. What used to take days with Claude models started taking weeks and put weekly and hourly limits on top of that. At some point it looked like there is no escape. Earlier, Claude price was justified for the quality it offered. Lately I started to feel like I am getting sub par quality at inflated price.

All these time, my z.ai plan was lying dormant as I lost hope in glm models entirely. Lately z.ai increased the price of their subscription that made me to see if it there are any changes that justifies the value it’s been offered. Opened vscode and started exploring glm models. Gave a small task and it did it in a flash without mistakes. Surprised and then gave a bigger task and it aced it. Thought Oh god this is getting serious. Closed the day with a surprise and when I resumed next day there was another surprise from z.ai. I wasn’t able to use my subscription coz, “I violated their fair usage policy”. What fair usage? My foot. Account was dormant for almost 3 months and 2 sessions after that lead to temporary suspension! Got frustrated. But still intrigued by the performance improvement I waited patiently and got the account working after 4 days. Meanwhile I deleted all loose api keys and openclaw links.

For the past 3 days, I could feel that performance and speed are top notch and the difference between January and April is floor and ceiling. Workhorse is doing its part well. I want to race it against Claude models. Had parallel sessions running side by side in vscode on the same codebase. Still did not have full trust and for code edits and planning I mostly relied on sonnet and opus. There was an instance where I asked both claude(opus 4.7) and glm 5.1, to under the difference between v2 and v3(I accidentally typed it as v2). I typed it first in Claude, hit send, copied it, pasted it in glm 5.1 session and left for a break without noticing it. Came back and saw the output. Opus interpreted it was v1 and v2 that I was asking where as glm interpreted it was v2 and v3 I was asking and both did their own work. What was my intention? It’s v2 and v3 coz the context before this message revolved around v3 only. Information and context were same for both. Glm got it correctly. (Image attached).

Next, I had a working codebase do one in iOS/swift and an architectural handoff document to develop in android/kotlin. In a fresh session asked both to explore the document and compare it with the codebase. Claude’s sonnet said files were missing and architecture document is wrong about that part(there were many gaps and that document was again prepared by Opus/sonnet). But glm 5.1 identified correctly that it was wrapped in swift and architecture.md is wrong. Two instance and on both accounts glm 5.1 got it right.(image attached)

I am planning to take up android part of this development with glm models as I have non exhaustive token limits with z.ai. Right now the performance of Claude models may have dropped and glm might have caught up which may have lead to this. Whatever, at present legacy plan holders are getting great value. For new subscribers z.ai has to be reliable and keep the consistency for them to get normalised to the price. Hope this sustains.

TL;DR - got two comparisons. Glm understood my intention and produced right output where as Claude model made a mistake. Next, intentionally gave same tasks to both and glm 5.1 did it right where as Claude model got it wrong and need to be nudged to look codebase again carefully to correct its own mistake. So may be glm 5.1 is on par with Claude models(better than Claude models in my experience, though evidence is circumstantial)

u/UsualOrganization712 — 2 days ago