
I built PodPilot — stop babysitting your RunPod GPUs
Real talk — who else has done this on RunPod?
- Refreshed the GPU availability page 15 times waiting for an A100
- Created a volume in one region, then realized the GPU you need only exists somewhere else
- Left a pod running overnight and woke up $40 poorer
- SSH’d into a pod again just to download model weights
- Launched a pod, watched it sit in
CREATEDfor 10 minutes, and had no idea whether it was broken or just slow
I got fed up and built PodPilot — a Python wrapper around the RunPod API focused on ML/CV workflows.
Here’s what it actually does that the dashboard + CLI don’t:
1. Smarter GPU selection
Instead of manually comparing cards, PodPilot scores GPUs using:
- VRAM fit
- Price
- Spot pricing
- Real-time availability
Works across both secure + community cloud.
mgr.recommend(vram_needed=48, budget=2.0)
It returns ranked recommendations with explanations — not just “here are some GPUs”, but why a specific GPU is the best value for your workload right now.
2. Volume + data center matching
This is probably the most annoying RunPod issue.
You create a volume in one data center…
then discover the GPU you actually need doesn’t exist there.
PodPilot checks GPU availability across data centers before creating the volume.
mgr.create_volume("my-models", 100, min_vram=48)
The volume gets placed where 48GB+ GPUs are actually available.
3. One-command model provisioning
This was the original reason I built the project.
You give PodPilot:
- a shell script
- target VRAM requirement
- volume size
It will:
- Find the best data center
- Create the network volume
- Launch the cheapest temporary pod in that region
- Upload and run your script
- Stream logs live
- Auto-terminate the pod afterward
​
mgr.provision_volume(
"sdxl-models",
size_gb=60,
download_script="download.sh",
gpu_vram=48,
)
End result:
- Models downloaded
- Volume ready
- No manual SSH
- No wasted GPU hours
4. Instant cost visibility
mgr.status()
╭─── RunPod Status ───╮
│ Balance: $47.23 │
│ Spend: $0.44/hr │
│ Hours left: 107.3h │
│ Pods: 2 running, 1 stopped │
│ ● training A100 80GB $1.64/hr │
│ ● inference RTX 4090 $0.44/hr │
╰─────────────────────────────────────╯
One glance shows:
- Remaining credits
- Burn rate
- Estimated runtime left
- Every running pod + hourly cost
5. Pod startup that actually gives feedback
0s Status: CREATED
12s Status: STARTING
Provisioning GPU... ████████████░░░░░░ 38s / 900s
52s Status: RUNNING
Pod ready!
- Live progress
- Auto-retry on API hiccups
- Timeout handling
- Duplicate pod detection
If a pod already exists, it resumes instead of creating another one.
6. Panic buttons
mgr.stop_all() # stop everything NOW
mgr.cleanup() # remove stopped pods eating storage
mgr.terminate_all() # nuke everything
Not a startup. Not a paid tool. Not even on PyPI yet.
Just something I built because I was spending more time managing infrastructure than training models.
git clone https://github.com/anmolduainter/PodPilot.git
Would love feedback from people running ML/CV workloads on RunPod.
What’s the most annoying part of your current setup?