u/Healthy_Ship4930

Month four building my own embedded Python runtime in Rust 145kb release, runs in-browser via WASM. demo.

Tested it across a bunch of hosts this week and hit something I can't find documented anywhere: on machines with virtualization-based security enabled (HVCI/VBS default on every Copilot+ PC with Snapdragon X Elite), every memory.grow call in WASM costs ~0.2ms. Not microseconds. The hypervisor validates EPT page mappings before handing them to V8, and that round-trip is the whole cost.

My demo does ~3,000 allocs per run. I was using LeakingPageAllocator from lol_alloc the one in every README which calls memory.grow(1) on every alloc. 3,000 × 0.2ms = ~600ms of pure grow overhead before a single line of Python executes. Edge on Snapdragon X Elite: ~5s to feel responsive. Same WASM on Linux/macOS: under 50ms. 100x cold-start difference, same module, same V8.

Fix was a one-line swap to LeakingAllocator a real bump pointer that grabs pages in bulk and bumps within them. ~50 grows instead of 3,000. Cold start dropped to ~10ms.

#[global_allocator]
static A: AssumeSingleThreaded&lt;LeakingAllocator&gt; =
    unsafe { AssumeSingleThreaded::new(LeakingAllocator::new()) };

The leak is fine here the JS host throws away the WASM instance between runs, so working set is bounded and the OS reclaims everything on teardown.

What I want to know: is anyone else benchmarking WASM cold-start on Copilot+ PCs? The default allocator everyone copies from lol_alloc's README is genuinely the worst choice for these machines, and Microsoft is pushing them hard. If you've shipped WASM in Edge on a Snapdragon X Elite and measured, what did you see? Is this a lol_alloc gotcha or a general "VBS makes memory.grow expensive" thing?

Happy to answer questions about the runtime parser, VM, GC, plugin ABI, whatever.

Has anyone else hit this 600ms memory.grow cost on Copilot+ PCs? Why is nobody talking about it?