Building a Windows-first orchestration layer for distributed GPU compute using consumer hardware
Over the last few months I’ve been building a backend orchestration system focused on coordinating distributed GPU workloads across multiple runtime/provider environments.
Current systems include:
- workload routing
- telemetry arbitration
- heartbeat/recovery logic
- failover handling
- sandboxed execution
- provider orchestration
- operator HUD tooling
The long-term goal is making fragmented GPU resources easier to coordinate and utilize across future compute markets.
Still early, but the orchestration layer is finally starting to behave like a real distributed system instead of isolated components.
Would genuinely love feedback from people with experience in:
- distributed systems
- orchestration
- homelab clusters
- GPU infrastructure
- runtime/container systems