u/Few-Fortune-1251

Most AI projects start with a model. Talki Infra starts with your hardware.

  Hey everyone,

  I’ve been building local LLM clusters for a while, and I got tired of the "trial and error" approach to

  deployment. We often ask: "Will this model fit?", "Why did the Brain choose this quantization?", or "Why is my

  Docker container failing to see the GPU again?"

  To solve this, I built Talki Infra—a CLI-first orchestration tool that treats your AI infrastructure like a

  production-grade system.

  💡 The Philosophy: "Boring Stack, Brilliant Inferences"

  We use a 4-stepOps-validated workflow (Scan ➔ Recommend ➔ Doctor ➔ Deploy):

   1. 🔍 Talki Scan: Non-intrusive discovery. It doesn't just check VRAM; it captures raw command outputs as

Evidence for auditability. Supports NVIDIA (nvidia-smi), AMD (rocm-smi), and Mac.

   2. 🧠 Talki Brain: A decision engine that uses a weighted fit_score (Quality, Perf, Reliability, Compliance,

Cost) to map models to specific hardware roles. No "black box" decisions—every recommendation comes with a

mathematical rationale.

   3. 🩺 Talki Doctor: A pre-flight gap analysis. It finds "phantom issues" (missing NVIDIA runtimes, port

conflicts, insufficient disk for weights) before you start the deployment.

   4. 🛠️ Talki Deploy: Idempotent Ansible orchestration. It sets up the entire stack: Drivers ➔ vLLM ➔ LiteLLM

Gateway ➔ Open WebUI ➔ Prometheus/Grafana.

  🚀 Key Features:

   * Multi-GPU Optimization: Automatically calculates Tensor Parallelism and KV Cache (max_model_len) based on real

available VRAM.

   * Unified API Gateway: Routes traffic through LiteLLM with automatic cloud fallbacks (e.g., local Qwen ➔ Cloud

Claude 3.5) based on your environment policies (Prod vs. Lab).

   * Post-deploy Smoke Tests: A built-in talki test command to verify JSON output integrity and latency empirically.

   * Enterprise-Ready: Full observability stack included out-of-the-box.

  🛠️ Tech Stack:

  Python 3.10 (Pydantic v2, Typer, Rich), Ansible, Docker, Prometheus.

  I’ve just made the repo public and I’d love to get your feedback on the fit_score logic and the hardware

  collectors.

  Check it out here: https://github.com/fossouo/talki-infra (https://github.com/fossouo/talki-infra)

  “Because AI infrastructure shouldn’t be a guessing game.”

reddit.com
u/Few-Fortune-1251 — 12 days ago