r/SelfHostedAI

OpenClaw + WhatsApp Cloud API
▲ 3 r/SelfHostedAI+2 crossposts

OpenClaw + WhatsApp Cloud API

Finally got OpenClaw talking directly to WhatsApp via Meta's official Cloud API. Wrote up the full setup here so others don't have to dig through the scattered docs as I did.

Would love to hear how others in this community are handling WhatsApp integrations

u/amrabed — 17 hours ago
▲ 2 r/LocalLLaMA+2 crossposts

Running 28B LLMs locally on a ~$550 mini PC (no discrete GPU)

Dense vs MoE models on iGPU — same 28B, 5x the speed

The formula for memory-bound inference is just:

tok/s ≈ bandwidth ÷ bytes read per token

Running on a Radeon 780M with shared DDR5 RAM (~75-80 GB/s effective bandwidth):

Qwen3-27B at Q4_K — dense, every token reads all 14GB of weights: 14 GB ÷ 78 GB/s → ~5.7 tok/s (measured: 5.8)

Gemma 4 28B — MoE, each token only activates ~4-5B params out of 28B: 4 GB ÷ 78 GB/s → ~20 tok/s (measured: 19.5)

Same stated size. 5x faster. Because you're reading 5x less data per token.

The inactive experts aren't wasted — the router picks the best-matched ones for each token, that's where the quality comes from. You just don't pay bandwidth cost for the ones that aren't selected.

Pattern holds at 32B too: dense Qwen3-32B at Q8 hits 2.8 tok/s, the MoE variant (A3B, ~3B active) hits 20.8 tok/s on the same box.

If you're running local models on integrated graphics, MoE is worth understanding.

stochasticsandbox.com
u/Main_Brush_5086 — 15 hours ago
Week