u/Few-Fortune-1251 — reddlx

Most AI projects start with a model. Talki Infra starts with your hardware.

Hey everyone,

I’ve been building local LLM clusters for a while, and I got tired of the "trial and error" approach to

deployment. We often ask: "Will this model fit?", "Why did the Brain choose this quantization?", or "Why is my

Docker container failing to see the GPU again?"

To solve this, I built Talki Infra—a CLI-first orchestration tool that treats your AI infrastructure like a

production-grade system.

💡 The Philosophy: "Boring Stack, Brilliant Inferences"

We use a 4-stepOps-validated workflow (Scan ➔ Recommend ➔ Doctor ➔ Deploy):

1. 🔍 Talki Scan: Non-intrusive discovery. It doesn't just check VRAM; it captures raw command outputs as

Evidence for auditability. Supports NVIDIA (nvidia-smi), AMD (rocm-smi), and Mac.

2. 🧠 Talki Brain: A decision engine that uses a weighted fit_score (Quality, Perf, Reliability, Compliance,

Cost) to map models to specific hardware roles. No "black box" decisions—every recommendation comes with a

mathematical rationale.

3. 🩺 Talki Doctor: A pre-flight gap analysis. It finds "phantom issues" (missing NVIDIA runtimes, port

conflicts, insufficient disk for weights) before you start the deployment.

4. 🛠️ Talki Deploy: Idempotent Ansible orchestration. It sets up the entire stack: Drivers ➔ vLLM ➔ LiteLLM

Gateway ➔ Open WebUI ➔ Prometheus/Grafana.

🚀 Key Features:

* Multi-GPU Optimization: Automatically calculates Tensor Parallelism and KV Cache (max_model_len) based on real

available VRAM.

* Unified API Gateway: Routes traffic through LiteLLM with automatic cloud fallbacks (e.g., local Qwen ➔ Cloud

Claude 3.5) based on your environment policies (Prod vs. Lab).

* Post-deploy Smoke Tests: A built-in talki test command to verify JSON output integrity and latency empirically.

* Enterprise-Ready: Full observability stack included out-of-the-box.

🛠️ Tech Stack:

Python 3.10 (Pydantic v2, Typer, Rich), Ansible, Docker, Prometheus.

I’ve just made the repo public and I’d love to get your feedback on the fit_score logic and the hardware

collectors.

Check it out here: https://github.com/fossouo/talki-infra (https://github.com/fossouo/talki-infra)

“Because AI infrastructure shouldn’t be a guessing game.”