r/WebAssembly
Just as a privacy note (you can double-check with dev tools): This tool works fully offline, we do NOT send any uploaded binaries or data to our backend.
This tool was built by our WebAssembly analysis team, originally it was for internal use only but we have decided to make it public and free for everyone, forever.
Please do leave feedback in the comments! We'd love to hear what you think and how we can improve it even further. It is still heavily in a barebones beta phase, as we work on adding more features.
(This is not an advertising post for any paid or free services of TrustSig, this post is strictly to share the free tool we published and a blog post on how we made it)
We've been running wasm modules inside a JVM application (a Rust wasmprinter embedded via GraalWasm) and the obvious follow-up question was: how does this compare to the alternatives, and when should we actually pick something else?
So I built a small JMH harness that runs the same proxy.wasm artifact through six execution paths and wrote up the results. Sharing here because I couldn't find a head-to-head comparison covering all of these in one place, and I'd genuinely like to hear if anyone has reasons to expect different numbers on different workloads.
The workload
A tiny Rust crate compiled to wasm32-wasip1 exposing one export:
#[no_mangle]
pub unsafe extern "C" fn decode_jpeg(
in_ptr: *const u8, in_len: usize,
out_ptr: *mut u8, out_cap: usize,
) -> i32 { /* jpeg-decoder → RGB8 */ }
Input: a 320×240 JPEG baked into the wasm via include_bytes!. Output: 230,400 bytes of RGB. Steady-state ~1 ms of native CPU — small enough to expose call/dispatch overhead, big enough that the JIT actually kicks in. Cross-variant correctness check: every backend produces byte-identical output (sha256 matches across all six).
The six backends
| Backend | What it actually is |
|---|---|
chicory |
Chicory's pure-Java interpreter |
chicory-aot |
Chicory + MachineFactoryCompiler.compile(...) at JVM startup |
chicory-aot-plugin |
Chicory build-time AOT via chicory-compiler-maven-plugin (wasm → JVM .class at mvn compile) |
graalwasm |
GraalWasm with Truffle JIT enabled (libgraal) |
graalwasm-interp |
GraalWasm with engine.Compilation=false |
native-ffm |
Wasmtime/Cranelift in a Rust cdylib, called via Java's FFM API |
JVM: Oracle GraalVM 25 (25+37-LTS-jvmci-b01), Apple Silicon. JMH 5×1s warmup + 5×2s measurement, 1 fork, single thread.
Results (µs/op, lower is better)
| Backend | Mean | vs Wasmtime |
|---|---|---|
nativeFfm — Wasmtime/Cranelift via FFM |
971 ± 10 | 1.00× |
graalwasm — GraalWasm Truffle JIT |
1,275 ± 332 | 1.31× |
chicoryAot — Chicory runtime AOT |
9,037 ± 118 | 9.31× |
chicoryAotPlugin — Chicory build-time AOT |
9,198 ± 131 | 9.47× |
graalwasmInterp — GraalWasm Truffle no-JIT |
69,992 ± 1,204 | 72.1× |
chicory — Chicory pure interpreter |
240,707 ± 2,560 | 248× |
A few things worth pulling out
GraalWasm JIT is almost native. 1.31× of Wasmtime/Cranelift is genuinely good — I expected a bigger gap given that Truffle goes through partial evaluation while Cranelift goes wasm → CLIF → assembly directly. After warmup, libgraal produces code competitive with Cranelift's output for this workload. The ±25% CI on graalwasm is the only weak number here, probably tier-promotion noise that more forks would smooth out.
Build-time vs runtime AOT in Chicory is a wash. 9,037 vs 9,198 µs/op, CIs overlap. They run identical bytecode — Chicory's compiler produces the same .class content whether invoked at mvn compile or at JVM startup. Choose based on deployment story, not perf.
The calibration trap. graalwasm-interp at 70,000 µs/op is what you get on stock OpenJDK without JVMCI / libgraal. Truffle prints exactly one warning at startup:
>
…and then runs at interpreter speed. If you benchmark GraalWasm on Temurin or Corretto and conclude it's unusable, you're running it without its compiler. The fix on most platforms is to install Oracle GraalVM 25 (or CE) — the Graal compiler ships in the JDK and Truffle picks it up automatically. If you can't change vendor, the "jargraal" path with org.graalvm.compiler:compiler + org.graalvm.truffle:truffle-compiler on --upgrade-module-path and -XX:+EnableJVMCI works but is fiddly.
Pure interpreters aren't benchmarks. 248× slower means Chicory's interpreter isn't a viable production path for non-trivial workloads. It's still the right default for "run untrusted user wasm with a 100 ms budget" sandbox scenarios — instant startup, no codegen step.
Bonus silliness
While I had the harness open: I compiled Cranelift's codegen library itself to wasm32-wasip1, AOT'd that 2.7 MB wasm artifact via chicory-compiler-maven-plugin into a JVM .class file, and used the resulting Chicory-hosted, JVM-resident Cranelift to emit native machine code for all six host triples. Output sizes for an add(i32,i32) -> i32 test function:
| Triple | Object bytes | Format |
|---|---|---|
aarch64-apple-darwin |
320 | Mach-O |
aarch64-unknown-linux-gnu |
600 | ELF |
aarch64-pc-windows-msvc |
126 | COFF |
x86_64-apple-darwin |
328 | Mach-O |
x86_64-unknown-linux-gnu |
608 | ELF |
x86_64-pc-windows-msvc |
130 | COFF |
Six of Cranelift's ~4,000 internal functions exceed the JVM's 64 KB method-size limit and fall back to Chicory's interpreter; the rest AOT cleanly into a single 2.6 MB .class. Not (yet) a wasm-to-CLIF translator inside the sandbox — cranelift-wasm was deprecated at 0.112 and the translator now lives inside Wasmtime, so a real wasm-compiling-wasm pipeline would mean pinning to deprecated 0.112 or hand-rolling it on wasmparser. Separate project.
Caveats
One workload (small JPEG, ~1 ms of native CPU), one platform (Apple Silicon, GraalVM 25), one JMH config. These generalize well for "small to medium pure-compute wasm modules that don't touch WASI on the hot path" but will shift for: large modules (GraalWasm setup cost grows with module size), WASI-heavy workloads (host-call cost differs across runtimes), JIT-cold workloads (you're measuring tier-up, not steady state), and other JVMs (J9, Zing not measured).
Harness
Source: https://github.com/minamoto79/webasm-java-integration-benchmark
Switching backends in the harness is two lines of Kotlin — happy to take PRs adding workloads or runtimes I missed (wasmer-java? wazero-on-JVM via JNI? would love numbers on those if anyone has them). And if you're seeing materially different ratios on a different workload or JDK, please post — would help calibrate where these numbers actually generalize.
Hey r/GeminiCLI, we are thrilled to share with you in preview BrowserCode: A FOSS web app to run TUI agents (such as Claude Code, OpenCode, Gemini CLI and the like) fully in the browser. This first release focuses on Gemini CLI and Claude Code will follow soon. BrowserCode is released under the Apache License, version 2.0.
BrowserCode is based on BrowserPod (https://browserpod.io), a in-browser WebAssembly-based code sandboxing technology that can currently run Node.js, python, git, bash and many other command line tools. This will further expand to Ruby / Rails, Go, Rust and eventually x64 Linux binaries.
BrowserCode is free to use and unlimited. You'll need to login to each CLI with the corresponding login, i.e. with your Google account or API key for Gemini CLI. All the data and execution stays completely local to the browser and it's persistent across sessions thanks to a disk backend based on the Origin Private File System API or IndexedDB.
This is a preview release, so please try and break it! Please report issues on GitHub (https://github.com/leaningtech/browsercode) and star the repo if you like our work. Your support will help us push this project forward.
For any question or feedback please consider joining our Discord: https://discord.leaningtech.com