Built a distributed AI platform with Flask as the backend — task parallelism across multiple machines running local LLMs
I wanted to share a project where Flask is the backbone of a distributed AI computing platform.
The architecture: a Flask API server coordinates work between multiple machines, each running their own local AI model. One machine (the "Queen") receives a complex job through the API, uses its local LLM to decompose it into independent subtasks, and distributes them to worker machines. Each worker processes its subtask independently and submits results back through the Flask API. The Queen combines everything into the final answer.
The Flask backend handles user authentication (Flask-Login), CSRF protection (Flask-WTF), role-based access control, a credit/payment system (PayPal REST API integrated), job queuing and status tracking, and a full REST API that the desktop client communicates with. SQLite via SQLAlchemy for the database.
The desktop client is a separate repo — PyQt6 GUI + CLI mode, supports 5 AI backends (Ollama, LM Studio, llama.cpp server, llama.cpp Python, vLLM). Workers poll the Flask API for available subtasks, process them locally, and submit results back.
Tested across two Linux machines (RTX 4070 Ti + RTX 5090): 64 seconds on LAN, 29 seconds via Cloudflare over the internet. Built in 7 days, one developer, fully open source, MIT licensed.
I'll share the GitHub link in the comments.