u/MohamedHeroo

▲ 3 r/ollama

I built an offline multi-modal AI assistant (Voice + Vision) that runs locally on my laptop

Hey guys,

I wanted to share a side project I've been building on my laptop for the past few weeks. It's called HERO ZAN, and it's basically a fully offline, private AI assistant that can speak, listen, and see through the webcam without using any external APIs or cloud services.

I wanted something that supports Arabic natively, has a low latency, and doesn't melt my system resources. Here is the stack I ended up using to make it work:

Ollama as the backend for the LLM (I'm using qwen2.5-coder:7b since it handles Arabic really well and gives solid reasoning).

Faster-Whisper (medium model) for speech-to-text. It's surprisingly fast on local hardware.

Piper TTS for the voice output. Finding a good, natural-sounding local Arabic TTS was a pain, but Piper ONNX models did the trick.

Moondream (via Ollama) for the vision part. If you ask it "شايف إيه؟" (What do you see?), it grabs a frame from the webcam and describes it.

CustomTkinter for a simple GUI, featuring a small animated cartoon face that changes its expression depending on what the assistant is doing (thinking, listening, talking, etc.).

Everything runs locally on my machine (I'm currently testing it on a standard AMD Ryzen 5 Pro setup with 8GB RAM, and it runs smoothly without choking the system). It also has local chat history and an optional local web search via DuckDuckGo if needed.

The main reason I built this was to prove to myself that we don't need massive server farms or expensive API subscriptions to have a functional, multi-modal assistant that respects privacy 100%.

The code is fully open-source. If you want to check it out, run it locally, or contribute, here is the repo:

https://github.com/MHR-X/hero-zan

Let me know if you have any questions about the setup, the Piper TTS integration, or the performance!

reddit.com
u/MohamedHeroo — 1 day ago
▲ 2 r/dev+1 crossposts

They said 8GB RAM is impossible for Local AI, but they forgot I use Arch Linux.

Hey everyone,

I wanted to share this project I've been working on. I have a laptop with a Ryzen 5 Pro and only 8GB RAM (with integrated Radeon graphics). When I first told people that I wanted to build a local AI assistant that can talk back smoothly, listen to voice commands, and see my screen/images all at the same time, everyone told me it's absolutely impossible on these specs.

But they didn't know I was running Arch Linux with Hyprland.

Here is Hero-Zan running completely offline. No external APIs, no internet needed.

How I made it work:

OS: Arch Linux + Hyprland WM (keeps the idle RAM usage super low).

Backend: Ollama running highly quantized models.

Pipeline: Integrated STT (Speech-to-Text), TTS (Text-to-Speech), and Vision capabilities.

Optimizations: Wrote some precise scripts to manage memory overhead and prevent any swapping or crashes.

I'm planning to upgrade to 16GB RAM soon to test bigger models, but for now, I'm really proud of how fast this is running on just 8GB.

If you have any suggestions or ideas to improve this setup, I'd love to hear them! Also, let me know if you want to know anything about the scripts or how I optimized the resource management. 🐧🤖

u/MohamedHeroo — 2 days ago