u/CollegeStock8249

Over the last few months I’ve been building a backend orchestration system focused on coordinating distributed GPU workloads across multiple runtime/provider environments.

Current systems include:

- workload routing

- telemetry arbitration

- heartbeat/recovery logic

- failover handling

- sandboxed execution

- provider orchestration

- operator HUD tooling

The long-term goal is making fragmented GPU resources easier to coordinate and utilize across future compute markets.

Still early, but the orchestration layer is finally starting to behave like a real distributed system instead of isolated components.

Would genuinely love feedback from people with experience in:

- distributed systems

- orchestration

- homelab clusters

- GPU infrastructure

- runtime/container systems

Building a Windows-first orchestration layer for distributed GPU compute using consumer hardware