Day 3 of building to beat Claude cowork on computer use tasks
Day 3 of working on a computer-use agent
The idea is pretty simple. Instead of just generating text, I’m trying to get something that can actually use a device. It looks at what’s on the screen, decides what to do next, and takes actions step by step until it finishes a task
This clip is just a small result from today. Nothing crazy yet, but it’s starting to feel a bit more real
One thing that surprised me is most of the problems aren’t really about intelligence. The agent mostly fails when it misunderstands what’s on the screen. Even small changes in the interface can throw it off completely
I also noticed it works better when it keeps checking what just happened instead of assuming everything worked. Keeping the loop tight and simple seems to help more than trying to plan too far ahead
Right now it feels less like building something smart and more like trying to keep it grounded in what’s actually happening moment by moment
Still early, just sharing progress, see my progress live on GitHub: https://github.com/iBz-04/gloamy
