Hermes and Lemonade on Framework Desktop (trip report by a newbie)
Here's a quick report from an LLM newbie about my experience running Hermes locally.
Me: retired software architect (database internals, Linux sysadmin, containers and Kubernetes) - 40+ years in the biz, retired a year ago. Little knowledge about LLMs and such until a month or so ago.
Hardware: Framework Desktop 128 GB. (Strix Halo)
I am running Hermes on Fedora pointed at my local containerized Lemonade server. It's currently using Qwen3.5-27B-GGUF as the only model.
It took a few false starts to get it set up and running, but it's actually starting to do some useful work.
Problems getting it to work:
• The containerized version of Hermes was causing me trouble (which I cannot describe, might have been user error), so I am now running Hermes non-containerized.
• Lemonade and the model are configured for "thinking". This caused Hermes to generate totally obscure "connection error" messages when it tried to connect to Lemonade. That made me think that there was a DNS issue or a firewall problem, but in fact it was just that Hermes didn't like all the "thinking" data that Lemonade was throwing it at. (I would have NEVER figured that out; free Claude actually did.) To fix this I had to add an entry to my Lemonade docker compose file to set LEMONADE_LLAMACPP_ARGS=--chat-template-kwargs '{"enable_thinking":false}'
• The model didn't have a large enough default context size. I had to add the following to Lemonade's /root/.cache/lemonade/recipe_options.json file:
"Qwen3.5-27B-GGUF": {
"ctx_size": 128000
}
• Once these were sorted Hermes started up and kinda worked. I ran it with a couple of models, including Llama-3.2-3B-Instruct.GGUF and Qwen3-14B-GGUF. The llama model kinda worked but had some issues. The 14B Qwen model frequently went into crazy loops when I asked it to do some basic tasks - it was incapable of setting up a cron job in Hermes, for example, and spun out trying to do so.
• I am reasonably familiar with systemd, I thought, but had never been exposed to user scoped systemd services and timers and such. Hermes uses them in a way which feels quite cool, but which was a deep mystery to me when I started 48 hours ago. (That's a me problem, not a Hermes problem.)
• Setting up Discord messaging was complicated, but on the Discord side. Trying to figure out how to create the bot and get the necessary ids was a wee bit tricky but I got it done. (Again probably a me problem.)
What It's doing successfully so far (in less than a day)
Once I got it up and running with 27B I had it do a couple of things:
• I gave it a list of hosts to ping every 30 minutes and if anything is down or goes up instructed it to notify me via discord. It wrote a pretty nice script to do that. It then forgot to put it into a cron job, which I discovered the next day - but when I pointed that out it fixed it easily.
• I have it another list of hosts to ssh to every 30 minutes and monitor in the same way. It figured that out easily enough but the script messaged me with the details every 30 minutes whether anything was wrong or not. I pointed that out and it fixed it.
• I asked it if it could monitor ZFS filesystems, and it said that it could. It wrote a script which reported a warning that wasn't anything wrong. I reported that and it fixed it after a few false starts.
Performance:
Hermes replies to simple messages in 30 seconds. It replies to more complicated things in a minute or so. It took it maybe 4 or 5 minutes to fix one of the scripts that it wrote, for example.
Lemonade is properly configured to use the Strix Halo GPU (which was another horror story for another day). The llm server is running about 50% of one CPU while Hermes is using it (GPU is near 100%).
For example, I just asked it "what is the weather forecast for tomorrow for Roseville, California" which I had never asked it before. It nattered around for nearly 2 minutes before giving me the forecast from wttr.in (which is wildly different than that the National Weather Service says, but that's probably not a Hermes problem ... other than why it chose to look at wttr.in in the first place).
Summary
• You can absolutely do some stuff self-hosted with Hermes + Lemonade on Framework Desktop
• It's not fast
• It's not perfect
• It put together a monitoring solution that I had been meaning to do by hand for a month in a couple of hours (yes, I should have just been using Uptime Kuna).
• It'll be a while before I trust it with anything important
• But it's seems somewhat useful and fun
Next Steps:
• Point it at my Home Assistant?
• Voice?
• TBD