
▲ 3 r/deeplearning
An interesting challenge to squish out as many juice from Qwen2.5 0.5B model
https://www.h2loop.ai/contests/bear-the-tokens
Someone was able to optimize it to get more than 5k tok/s on a T4 GPU 😯
u/ANR2ME — 12 hours ago

https://www.h2loop.ai/contests/bear-the-tokens
Someone was able to optimize it to get more than 5k tok/s on a T4 GPU 😯