u/Extension-Ad-5912

I built PodPilot — stop babysitting your RunPod GPUs
▲ 1 r/RunPod

I built PodPilot — stop babysitting your RunPod GPUs

Real talk — who else has done this on RunPod?

  1. Refreshed the GPU availability page 15 times waiting for an A100
  2. Created a volume in one region, then realized the GPU you need only exists somewhere else
  3. Left a pod running overnight and woke up $40 poorer
  4. SSH’d into a pod again just to download model weights
  5. Launched a pod, watched it sit in CREATED for 10 minutes, and had no idea whether it was broken or just slow

I got fed up and built PodPilot — a Python wrapper around the RunPod API focused on ML/CV workflows.

Here’s what it actually does that the dashboard + CLI don’t:

1. Smarter GPU selection

Instead of manually comparing cards, PodPilot scores GPUs using:

  • VRAM fit
  • Price
  • Spot pricing
  • Real-time availability

Works across both secure + community cloud.

mgr.recommend(vram_needed=48, budget=2.0)

It returns ranked recommendations with explanations — not just “here are some GPUs”, but why a specific GPU is the best value for your workload right now.

2. Volume + data center matching

This is probably the most annoying RunPod issue.

You create a volume in one data center…
then discover the GPU you actually need doesn’t exist there.

PodPilot checks GPU availability across data centers before creating the volume.

mgr.create_volume("my-models", 100, min_vram=48)

The volume gets placed where 48GB+ GPUs are actually available.

3. One-command model provisioning

This was the original reason I built the project.

You give PodPilot:

  • a shell script
  • target VRAM requirement
  • volume size

It will:

  • Find the best data center
  • Create the network volume
  • Launch the cheapest temporary pod in that region
  • Upload and run your script
  • Stream logs live
  • Auto-terminate the pod afterward

​

mgr.provision_volume(
    "sdxl-models",
    size_gb=60,
    download_script="download.sh",
    gpu_vram=48,
)

End result:

  • Models downloaded
  • Volume ready
  • No manual SSH
  • No wasted GPU hours

4. Instant cost visibility

mgr.status()


╭─── RunPod Status ───╮
│ Balance: $47.23                     │
│ Spend: $0.44/hr                     │
│ Hours left: 107.3h                  │
│ Pods: 2 running, 1 stopped          │
│   ● training    A100 80GB  $1.64/hr │
│   ● inference   RTX 4090   $0.44/hr │
╰─────────────────────────────────────╯

One glance shows:

  • Remaining credits
  • Burn rate
  • Estimated runtime left
  • Every running pod + hourly cost

5. Pod startup that actually gives feedback

0s   Status: CREATED
12s  Status: STARTING
Provisioning GPU... ████████████░░░░░░ 38s / 900s
52s  Status: RUNNING
Pod ready!
  • Live progress
  • Auto-retry on API hiccups
  • Timeout handling
  • Duplicate pod detection

If a pod already exists, it resumes instead of creating another one.

6. Panic buttons

mgr.stop_all()        # stop everything NOW
mgr.cleanup()         # remove stopped pods eating storage
mgr.terminate_all()   # nuke everything

Not a startup. Not a paid tool. Not even on PyPI yet.

Just something I built because I was spending more time managing infrastructure than training models.

git clone https://github.com/anmolduainter/PodPilot.git

Would love feedback from people running ML/CV workloads on RunPod.

What’s the most annoying part of your current setup?

github.com
u/Extension-Ad-5912 — 6 days ago