u/anthony-kldload

I've been using ZFS copy-on-write clones as the provisioning layer for Kubernetes nodes and wanted to share the results.

The setup: KVM VMs running on ZFS zvols. Build one golden image (cloud image + kubeadm + containerd + Cilium), snapshot it, then clone per-node. Each clone is metadata-only — under 100ms to create, near-zero disk cost until the clone diverges.

Some numbers from a 6-node cluster on a single NVMe:

- Golden image: 2.43G

- 5 worker clones: 400-1200M each (COW deltas only)

- Total disk for 6 nodes: ~8G instead of ~15G if full copies

- Clone time: 109-122ms per node

- Rebuild entire cluster: ~60 seconds (destroy + re-clone)

Each node gets its own ZFS datasets underneath:

- /var/lib/etcd — 8K recordsize (matches etcd page size)

- /var/lib/containerd — default recordsize

- /var/lib/kubelet — default recordsize

Sanoid handles automated snapshots — hourly/daily/weekly/monthly per node. Rolling back a node is instant (ZFS rollback on the zvol). Nodes are cattle — drain, destroy the zvol, clone a fresh one from golden, rejoin the cluster.

The ZFS snapshot-restore pipeline also works through Kubernetes via OpenEBS ZFS CSI — persistent volumes backed by ZFS datasets with snapshot and clone support.

Built this into an open source project if anyone wants to look at the implementation: https://github.com/kldload/kldload

Demo showing the full flow: https://www.youtube.com/watch?v=egFffrFa6Ss
6 nodes, 15 mins.

Curious if anyone else is using ZFS clones for VM provisioning at this scale?

ZFS instant clones for Kubernetes node provisioning — under 100ms per node