spent the last 6 weeks evaluating sandbox approaches for running AI agents 24/7 and the tradeoffs are way more nuanced than the docs suggest.
docker is the obvious starting point but the shared kernel breaks down once an agent has sudo or pulls untrusted code. 'restart the container if it goes sideways' stops being good enough at scale, the blast radius is the whole host.
firecracker boots in around 125ms with a real kernel boundary which is what aws lambda runs underneath. management surface is heavier than docker compose but the isolation is the part u actually want for long-running agent workloads.
gvisor intercepts syscalls without needing a separate vm. the boot overhead is reasonable but io-heavy workloads take a real throughput hit. ran into this on a logs-shuffling agent and lost about 30% relative to plain docker, ended up moving that one back to docker bc the security profile didnt justify the cost.
kata containers gives strong isolation under k8s but the 1-3 second cold start kills any reactive workload. fine for batch jobs that wake up and process a queue, painful for anything user-facing.
cloud-hypervisor is the underrated one in this list, similar boot to firecracker, cleaner config story, smaller community though so the documentation is thinner and stack overflow is mostly empty.
ended up with firecracker for the production agent workloads where the agent needs sudo or runs arbitrary code, and kept docker for ephemeral one-shot agents that touch nothing sensitive. the 'firecracker for sensitive workloads, docker for everything else' split has held up for 5 weeks.
one thing the docs skip: getting nbd-client + a real init system inside firecracker that doesnt eat 60mb of ram. that took longer than picking the runtime.