r/WebAssembly

https://preview.redd.it/wqercu2zcnyg1.png?width=2048&format=png&auto=webp&s=370de2ae8a59213447dc5f136bed6dead124ac2f

We've been running wasm modules inside a JVM application (a Rust wasmprinter embedded via GraalWasm) and the obvious follow-up question was: how does this compare to the alternatives, and when should we actually pick something else?

So I built a small JMH harness that runs the same proxy.wasm artifact through six execution paths and wrote up the results. Sharing here because I couldn't find a head-to-head comparison covering all of these in one place, and I'd genuinely like to hear if anyone has reasons to expect different numbers on different workloads.

The workload

A tiny Rust crate compiled to wasm32-wasip1 exposing one export:

#[no_mangle]
pub unsafe extern "C" fn decode_jpeg(
    in_ptr: *const u8, in_len: usize,
    out_ptr: *mut u8, out_cap: usize,
) -&gt; i32 { /* jpeg-decoder → RGB8 */ }

Input: a 320×240 JPEG baked into the wasm via include_bytes!. Output: 230,400 bytes of RGB. Steady-state ~1 ms of native CPU — small enough to expose call/dispatch overhead, big enough that the JIT actually kicks in. Cross-variant correctness check: every backend produces byte-identical output (sha256 matches across all six).

The six backends

Backend	What it actually is
`chicory`	Chicory's pure-Java interpreter
`chicory-aot`	Chicory + `MachineFactoryCompiler.compile(...)` at JVM startup
`chicory-aot-plugin`	Chicory build-time AOT via `chicory-compiler-maven-plugin` (wasm → JVM `.class` at `mvn compile`)
`graalwasm`	GraalWasm with Truffle JIT enabled (libgraal)
`graalwasm-interp`	GraalWasm with `engine.Compilation=false`
`native-ffm`	Wasmtime/Cranelift in a Rust cdylib, called via Java's FFM API

JVM: Oracle GraalVM 25 (25+37-LTS-jvmci-b01), Apple Silicon. JMH 5×1s warmup + 5×2s measurement, 1 fork, single thread.

Results (µs/op, lower is better)

Backend	Mean	vs Wasmtime
`nativeFfm` — Wasmtime/Cranelift via FFM	971 ± 10	1.00×
`graalwasm` — GraalWasm Truffle JIT	1,275 ± 332	1.31×
`chicoryAot` — Chicory runtime AOT	9,037 ± 118	9.31×
`chicoryAotPlugin` — Chicory build-time AOT	9,198 ± 131	9.47×
`graalwasmInterp` — GraalWasm Truffle no-JIT	69,992 ± 1,204	72.1×
`chicory` — Chicory pure interpreter	240,707 ± 2,560	248×

A few things worth pulling out

GraalWasm JIT is almost native. 1.31× of Wasmtime/Cranelift is genuinely good — I expected a bigger gap given that Truffle goes through partial evaluation while Cranelift goes wasm → CLIF → assembly directly. After warmup, libgraal produces code competitive with Cranelift's output for this workload. The ±25% CI on graalwasm is the only weak number here, probably tier-promotion noise that more forks would smooth out.

Build-time vs runtime AOT in Chicory is a wash. 9,037 vs 9,198 µs/op, CIs overlap. They run identical bytecode — Chicory's compiler produces the same .class content whether invoked at mvn compile or at JVM startup. Choose based on deployment story, not perf.

The calibration trap. graalwasm-interp at 70,000 µs/op is what you get on stock OpenJDK without JVMCI / libgraal. Truffle prints exactly one warning at startup:

…and then runs at interpreter speed. If you benchmark GraalWasm on Temurin or Corretto and conclude it's unusable, you're running it without its compiler. The fix on most platforms is to install Oracle GraalVM 25 (or CE) — the Graal compiler ships in the JDK and Truffle picks it up automatically. If you can't change vendor, the "jargraal" path with org.graalvm.compiler:compiler + org.graalvm.truffle:truffle-compiler on --upgrade-module-path and -XX:+EnableJVMCI works but is fiddly.

Pure interpreters aren't benchmarks. 248× slower means Chicory's interpreter isn't a viable production path for non-trivial workloads. It's still the right default for "run untrusted user wasm with a 100 ms budget" sandbox scenarios — instant startup, no codegen step.

Bonus silliness

While I had the harness open: I compiled Cranelift's codegen library itself to wasm32-wasip1, AOT'd that 2.7 MB wasm artifact via chicory-compiler-maven-plugin into a JVM .class file, and used the resulting Chicory-hosted, JVM-resident Cranelift to emit native machine code for all six host triples. Output sizes for an add(i32,i32) -> i32 test function:

Triple	Object bytes	Format
`aarch64-apple-darwin`	320	Mach-O
`aarch64-unknown-linux-gnu`	600	ELF
`aarch64-pc-windows-msvc`	126	COFF
`x86_64-apple-darwin`	328	Mach-O
`x86_64-unknown-linux-gnu`	608	ELF
`x86_64-pc-windows-msvc`	130	COFF

Six of Cranelift's ~4,000 internal functions exceed the JVM's 64 KB method-size limit and fall back to Chicory's interpreter; the rest AOT cleanly into a single 2.6 MB .class. Not (yet) a wasm-to-CLIF translator inside the sandbox — cranelift-wasm was deprecated at 0.112 and the translator now lives inside Wasmtime, so a real wasm-compiling-wasm pipeline would mean pinning to deprecated 0.112 or hand-rolling it on wasmparser. Separate project.

Caveats

One workload (small JPEG, ~1 ms of native CPU), one platform (Apple Silicon, GraalVM 25), one JMH config. These generalize well for "small to medium pure-compute wasm modules that don't touch WASI on the hot path" but will shift for: large modules (GraalWasm setup cost grows with module size), WASI-heavy workloads (host-call cost differs across runtimes), JIT-cold workloads (you're measuring tier-up, not steady state), and other JVMs (J9, Zing not measured).

Harness

Source: https://github.com/minamoto79/webasm-java-integration-benchmark

Switching backends in the harness is two lines of Kotlin — happy to take PRs adding workloads or runtimes I missed (wasmer-java? wazero-on-JVM via JNI? would love numbers on those if anyone has them). And if you're seeing materially different ratios on a different workload or JDK, please post — would help calibrate where these numbers actually generalize.

Hey r/GeminiCLI, we are thrilled to share with you in preview BrowserCode: A FOSS web app to run TUI agents (such as Claude Code, OpenCode, Gemini CLI and the like) fully in the browser. This first release focuses on Gemini CLI and Claude Code will follow soon. BrowserCode is released under the Apache License, version 2.0.

BrowserCode is based on BrowserPod (https://browserpod.io), a in-browser WebAssembly-based code sandboxing technology that can currently run Node.js, python, git, bash and many other command line tools. This will further expand to Ruby / Rails, Go, Rust and eventually x64 Linux binaries.

BrowserCode is free to use and unlimited. You'll need to login to each CLI with the corresponding login, i.e. with your Google account or API key for Gemini CLI. All the data and execution stays completely local to the browser and it's persistent across sessions thanks to a disk backend based on the Origin Private File System API or IndexedDB.

This is a preview release, so please try and break it! Please report issues on GitHub (https://github.com/leaningtech/browsercode) and star the repo if you like our work. Your support will help us push this project forward.

For any question or feedback please consider joining our Discord: https://discord.leaningtech.com

Building a Wasm-in-Wasm Virtualizer (with JIT decrypted paged memory)