u/Lightforce_

I built a Rust CLI that detects N+1 SQL/HTTP anti-patterns across services and scores their carbon impact
▲ 0 r/rust

I built a Rust CLI that detects N+1 SQL/HTTP anti-patterns across services and scores their carbon impact

After a few years building Java/Spring microservices in enterprise settings (insurance, agriculture) I kept hitting the same N+1 queries that slip through code review because they span multiple services, redundant HTTP calls nobody notices until production latency spikes. Every project, same patterns, different stack.

I also have a background in environmental science (I founded and ran a scientific association for 3 years, organizing conferences with climate researchers from the IPCC and IPBES). So when I started thinking about an N+1 detector, I naturally wanted to quantify the environmental cost of wasteful I/O too, not just the latency impact.

Existing tools are either runtime-specific (Hypersistence Optimizer only works with JPA), heavy and proprietary (Datadog, New Relic), or don't correlate across services. So I built perf-sentinel: a lightweight Rust CLI that analyzes runtime traces and flags these patterns automatically regardless of language or ORM.

How it works:

It takes OpenTelemetry traces (or Jaeger/Zipkin exports) and runs them through a pipeline: ingest -> normalize -> correlate -> detect -> score -> report. Detection is protocol-level, it sees the SQL queries and HTTP calls your code produces, not the code itself. So it works the same whether you're using JPA, EF Core, SeaORM or raw SQL.

Quick demo (30 seconds):

cargo install perf-sentinel
perf-sentinel demo

Output:

=== perf-sentinel demo ===
Analyzed 17 events across 2 traces in 0ms

Found 3 issue(s):

  [WARNING] #1 N+1 SQL
    Service:  order-svc
    Endpoint: POST /api/orders/42/submit
    Template: SELECT * FROM order_item WHERE order_id = ?
    Hits:     6 occurrences, 6 distinct params, 250ms window
    Suggestion: Use WHERE ... IN (?) to batch 6 queries into one

  [WARNING] #2 N+1 HTTP
    Template: GET /api/users/{id}
    Hits:     6 occurrences
    Suggestion: Use batch endpoint with ?ids=...

  [CRITICAL] #3 Slow SQL
    Template: SELECT * FROM order_status WHERE order_id = ?
    Suggestion: Consider adding an index or optimizing query

--- GreenOps Summary ---
  Total I/O ops:     17
  Avoidable I/O ops: 10
  I/O waste ratio:   58.8%
  Est. CO₂:          0.000108 g

Quality gate: FAILED

What it detects:

  • N+1 SQL queries (same template, different params, tight time window)
  • N+1 HTTP calls across services
  • Redundant queries (exact duplicates)
  • Slow recurring queries (with p50/p95/p99 across traces)
  • Excessive fanout (parent span spawning too many children)

What makes it different:

  • Protocol-level, not runtime-level. Works with any language/ORM that emits OTLP traces. Unlike Hypersistence Optimizer (JPA only) or Datadog (shows queries in trace view but doesn't auto-detect N+1 patterns).
  • Built-in GreenOps scoring: every finding includes an I/O Intensity Score and optional gCO2eq estimate, aligned with the SCI model (ISO/IEC 21031:2024).
  • CI-native: perf-sentinel analyze --ci with configurable quality gate and exit codes. If someone introduces an N+1 the pipeline breaks.
  • SARIF export for GitHub/GitLab code scanning integration.
  • Imports Jaeger and Zipkin trace exports directly, no infra changes needed.
  • perf-sentinel explain --trace-id abc123 shows a tree view of a trace with findings annotated inline.
  • perf-sentinel inspect opens a TUI to browse findings interactively.
  • Can cross-reference with pg_stat_statements data for DB-side validation.

Numbers:

  • Single binary: ~4 MB (macOS arm64), ~5 MB (Windows), < 10 MB (Linux static)
  • < 5 MB RSS idle, < 20 MB under load (10k traces)
  • &gt; 100k events/sec throughput

I've been dogfooding it on a personal polyglot microservices project (Java Spring WebFlux + Virtual Threads, Quarkus/GraalVM Native, C# .NET NativeAOT, Rust Actix, all talking to each other) and it caught real N+1s across all stacks without any language-specific configuration.

Still early (v0.2.0) and I would really appreciate feedbacks on:

  • Detection heuristics: are the defaults sane? False positive rate?
  • CLI UX and output format
  • Anti-patterns you'd want detected that aren't covered

cargo install perf-sentinel or grab a binary from the releases page (Linux amd64/arm64, macOS arm64, Windows amd64).

github.com
u/Lightforce_ — 5 hours ago
WebFlux vs Virtual Threads vs Quarkus: k6 benchmark on a real login endpoint
🔥 Hot ▲ 73 r/java

WebFlux vs Virtual Threads vs Quarkus: k6 benchmark on a real login endpoint

I've been building a distributed Codenames implementation as a learning project (polyglot: Rust for game logic, .NET/C# for chat, Java for auth + gateway) for about 1 year. For the account service I ended up writing three separate implementations of the same API on the same domain model. Not as a benchmark exercise originally, more because I kept wanting to see how the design changed between approaches.

  • account/ : Spring Boot 4 + R2DBC / WebFlux
  • account-virtual-threads-version/ : Spring Boot 4 + Virtual Threads + JPA
  • account-quarkus-reactive-version/ : Quarkus 3.32 + Mutiny + Hibernate Reactive + GraalVM Native

All three are 100% API-compatible, same hexagonal architecture, same domain model (pure Java records, zero framework imports in domain), enforced by ArchUnit, etc.

Spring Boot 4 + R2DBC / WebFlux

The full reactive approach. Spring Data R2DBC for non-blocking DB operations, SecurityWebFilterChain for JWT validation as a WebFilter.

What's genuinely good: backpressure aware from the ground up and handles auth bursts without holding threads. Spring Security's reactive chain has matured a lot in Boot 4, the WebFilter integration is clean now.

What's painful: stack traces. When something fails in a reactive pipeline the trace is a wall of reactor internals. You learn to read it but it takes time. Also not everything in the Spring ecosystem has reactive support so you hit blocking adapters and have to be careful about which scheduler you're on.

Spring Boot 4 + Virtual Threads + JPA

Swap R2DBC for JPA, enable virtual threads via spring.threads.virtual.enabled=true and keep everything else the same. The business logic is identical and the code reads like blocking Spring Boot 2 code.

The migration from the reactive version was mostly mechanical. The domain layer didn't change at all (that's the point of hexagonal ofc), the infrastructure layer just swaps Mono&lt;T&gt;/Flux&lt;T&gt; for plain T. Testing is dramatically easier too, no StepVerifier, no .block() and standard JUnit just works.

Honestly if I were starting this service today I would probably start here. Virtual threads + JPA is 80% of the benefit at 20% of the complexity for a standard auth service.

Quarkus 3.32 + Mutiny + Hibernate Reactive + GraalVM Native

This one was purely to see how far you can push cold start and memory footprint. GraalVM Native startup is about 50ms vs 2-3s for JVM mode so memory footprint is significantly smaller. The dev experience is slower though because native builds are heavy on CI.

Mutiny's Uni&lt;T&gt;/Multi&lt;T&gt; is cleaner than Reactor's Mono/Flux for simple linear flows, the API is smaller and less surprising. Hibernate Reactive with Mutiny also feels more natural than R2DBC + Spring Data for complex domain queries.

Benchmark: 4 configs, 50 VUs and k6

Since I had the three implementations I ran a k6 benchmark (50 VUs, 2-minute steady state, i9-13900KF + local MySQL) on two scenarios: a pure CPU scenario (GET /benchmark/cpu, BCrypt cost=10, no DB) and a mixed I/O + CPU scenario (POST /account/login, DB lookup + BCrypt + JWT signing). I also tested VT with both Tomcat and Jetty, so four configs total.

p(95) results:

Scenario 1 (pure CPU):

VT + Jetty    65 ms  &lt;- winner
WebFlux       69 ms
VT + Tomcat   71 ms
Quarkus       77 ms

Scenario 2 (mixed I/O + CPU):

WebFlux       94 ms  &lt;- winner
VT + Tomcat  118 ms
Quarkus      120 ms  (after tuning, more on that below)
VT + Jetty   138 ms  &lt;- surprisingly last

A few things worth noting:

WebFlux wins on mixed I/O by a real margin. R2DBC releases the event-loop immediately during the DB SELECT. With VT + JDBC the virtual thread unmounts from its carrier during the blocking call but the remounting and synchronization adds a few ms. BCrypt at about 100ms amplifies that initial gap, at 50 VUs the difference is consistently +20-28% in favor of WebFlux.

Jetty beats Tomcat on pure CPU (-8% at p(95)) but loses on mixed I/O (+17%). Tomcat's HikariCP integration with virtual threads is better tuned for this pattern. Swapping Tomcat for Jetty seems a bit pointless on auth workloads.

Quarkus was originally 46% slower than WebFlux on mixed I/O (137 ms vs 94 ms). Two issues:

  1. default Vert.x worker pool is about 48 threads vs WebFlux's boundedElastic() at ~240 threads, with 25 VUs simultaneously running BCrypt for ~100ms each the pool just saturated.
  2. vertx.executeBlocking() defaults to ordered=true which serializes blocking calls per Vert.x context instead of parallelizing them. Ofc after fixing both (quarkus.thread-pool.max-threads=240 + ordered=false) Quarkus dropped to 120 ms and matched VT+Tomcat. The remaining gap vs WebFlux is the executeBlocking() event-loop handback overhead (which is structural).

All four hit 100% success rate and are within 3% throughput (about 120 to 123 req/s). Latency is where they diverge, not raw capacity.

Full benchmark report with methodology and raw numbers is in load-tests/results/BENCHMARK_REPORT.md in the repo.

Happy to go deeper on any of this.

gitlab.com
u/Lightforce_ — 3 days ago