u/jfowers_amd

vLLM ROCm has been added to Lemonade as an experimental backend
▲ 481 r/StrixHalo+1 crossposts

vLLM ROCm has been added to Lemonade as an experimental backend

vLLM has the ability to run .safetensors LLMs before they are converted to GGUF and represents a new engine to explore. I personally had never tried it out until u/krishna2910-amd/ u/mikkoph and u/sa1sr1 made it as easy as running llama.cpp in Lemonade:

lemonade backends install vllm:rocm
lemonade run Qwen3.5-0.8B-vLLM

This is an experimental backend for us in the sense that the essentials are implemented, but there are known rough edges. We want the community's feedback to see where and how far we should take this. If you find it interesting, please let us know your thoughts!

Quick start guide: https://lemonade-server.ai/news/vllm-rocm.html GitHub: https://github.com/lemonade-sdk/lemonade Discord: https://discord.gg/5xXzkMu8Zk

u/jfowers_amd — 6 days ago
▲ 272 r/MechKeyboards+1 crossposts

Keyboard is the Keychron Q65 Max. Switches are Keygeek Y2 linears. I love the way this thing sounds, it’s insane how far pre-builds (plus customizations) have come. I got a Keychron because I wanted to go wireless and I’m happy with it so far.

u/jfowers_amd — 10 days ago
▲ 94 r/StrixHalo+1 crossposts

I’ve always liked how if I ask ChatGPT to make or edit an image, it just does it. Local AI should be this convenient! One install, one endpoint. Ask for an image of a cat and it appears. Ask for a hat on the cat, with a narrated story. Now we can easily build immersive experiences.

Lemonade's OmniRouter brings that same pattern to local through built-in tools:

  • Image generation/ editing through sd.cpp
  • Text-to-speech through kokoros
  • Transcription through whisper.cpp
  • Vision through llama.cpp

Your workflow talks to Lemonade running on your own NPU/GPU through OpenAI-compatible tool calling.

How it works:

  1. Lemonade sets up all these local AI engines for your system.
  2. Add Lemonade’s tool definitions to your workflows.
  3. When your LLM triggers a tool call it gets routed to the corresponding engine (sd.cpp, whisper.cpp, kokoros).
  4. Feed the result back into your loop.

That’s it. No custom orchestration layer, no new abstractions to learn. Check it out in this 181-line e2e Python example.

We’ve added support for OmniRouter in our reference web ui (also available as a Tauri app), which is what you’re seeing in the video. But I’m much more excited to see what people build on top.

I know my next project is going to be some kind of TTRPG-style adventure game. It’s already surprisingly fun to ask OmniRouter to be a dungeon master who illustrates and narrates the story, and I think it can be enhanced quite a bit if I build an app/harness around it.

If you find this interesting, please drop us a star and say hi!

u/jfowers_amd — 15 days ago