r/softwarearchitecture

🔥 Hot ▲ 461 r/softwarearchitecture+1 crossposts

A visual software architecture simulator running entirely in the browser using Rust + WASM

I've been working on a browser based tool called ArchAlive and wanted to get some feedback on it. It is basically a visual sandbox where you can design backend systems, like API gateways, load balancers, and servers, and then simulate the HTTP traffic route through them in real time.

You can try it here: https://archalive.com/ (free, no signups)

While the frontend canvas is React/TypeScript, the core simulation engine that calculates routing, handles queue bottlenecks, and tracks individual request states is written entirely in Rust and compiled to WebAssembly. It can simulate quite a bit of requests smoothly.

Let me know what you think. Not sure where to take this project from here.

streamable.com
u/Antigober — 2 days ago

Event-first CQRS (NOT your typical event sourcing + CQRS)

Hey I want to explore a somewhat untraditional approach to CQRS to see to what extent it's viable.

Both CQRS with an outbox and CQRS + event sourcing enable you to validate business logic, in the command endpoint, against a synchronous model before you emit the event.

With "CQRS with an outbox" you validate business logic against the write side OLTP DB, and not the read model.

With "CQRS + event sourcing" you validate business logic against the current aggregate state reconstructed from the event stream, typically using snapshots, and again not the read model.

I'd like to know if it's viable to validate potential events against the asynchronous read model. It removes the need for a separate write-side model, reducing duplication and architectural complexity.

I'd like to know if it can be viable to a certain extent, I don't need my event logs to be as flawless as a well designed event sourced system. As long as state is reconstructed correctly then I don't really care about duplicate events in the log and stuff like that.

reddit.com
u/neoellefsen — 23 hours ago

I'm interviewing for solution architect roles this week. I've been an architect for about two years at a really small startup (consulting, ~20 employees), and the interviews are at much bigger places (500-12000). What should I expect? Would love to hear the day to day of architects at big places.

I suspect that my job responsibilities and what I handle are extremely different due to the scale. For context, prior to becoming an architect at this startup two years ago, I was a lead full stack engineer for about 3-4 years and have, give or take, 16 years of experience total in the industry.

I was on track to become a solution architect at my old job before layoffs hit. People from that job created a startup and hired me on as an architect. Unfortunately, that startup is likely failing. We are facing a funding issue, and if our primary client cannot generate funds by 5/1, we're shutting down (this has been an issue for some time, we're just coming up to the deadline, hence my post).

For people working as solution architects at bigger places (in the hundreds to thousands in terms of employees), what does your day to day look like? What should I expect? If you have been a solution architect at both a startup and a large company, what differences did you notice?

reddit.com
u/skyturnsred — 1 day ago

The ropes

Hallo guys!

I'm a SE with 10y of field experience in middleware integration. Now i got my hands on a university spin off start-up job (I'm one of the founders).

However my architecture knowledge is very limited or non-existent. I only know what i've worked with which are almost all micro services, what I want out of this job is to either be a successful architect for this company or if it doesn't work out have the sufficient skills to become one in another company.

What are the things I need to learn/read/familiarise myself with to become a good software architect, what are the pitfalls,...?

Thanks in advance

reddit.com
u/Infectedinfested — 1 day ago
🔥 Hot ▲ 56 r/softwarearchitecture

Rearchitecting a 8+ year-old OMS, am I about to make the classic mistakes?

I'm working on an Order Management System for a retailer. ~300K transactions/day, 24+ Spring Boot microservices, 8+ years old, inherited. The system makes real money every day, but the coupling has become existential and before I commit to a rewrite/refactoring, tell me where I'm wrong.

The pain:

- No two services talk directly: Everything routes through a central Camunda orchestrator over RabbitMQ, + a home made mini BPMN framework for correlations and retries. The orchestrator isn't a coordinator anymore, it's a bus with opinions.

- Every release is a major release. Non-trivial changes touch 50%+ of services, BPMN changes aren't backwards-compatible with in-flight instances, so there's no rollback

- One god model, one database, shared as a "common/core" dependency. Change a field, coordinate + 24 deployments. The shared lib is the de facto API contract

- Many other shared libs, logging, monitoring, testing, infra connections (rabbit, kafka, couchbase, ES...) all shared by all microservices

- No config management, 90%+ lives in Helm values and env vars. Changing a threshold = commit, pipeline, pod restart.

- Debugging "how did this order end up in this state" is painful. No real audit trail beyond logs. no versioning, 1 document per order updated again and again

- We have a home-grown Python tool wired into CI/CD to coordinate releases: it decides build order, opens MRs across repos to bump the shared/common libs, and sequences deployments. If you need a tool like this to ship, your services aren't independent.

- For years until I joined, multiple teams ran their own environments and infra, you can imagine the release complexity. A single release could take up to a month

Are these separate problems or one problem wearing seven hats ?

My plan:

- Event-source the aggregates that actually benefit from it (Order, Payment, Inventory, fulfillment...). Leave CRUD things as CRUD. Don't event-source for the sake of it.

- Drop Camunda. Use a lightweight state machine if needed in code + saga orchestrators for cross-aggregate flows.

- Consolidate to one messaging backbone (probably Kafka).

- Kill the shared libs, kill the god model

- OpenTelemetry + proper tracing

- Strangler fig, not big bang.

So how would you tackle a project like this?

What strategy would you adopt? Is there hope, or is this the kind of system you just keep alive until the business replaces it?

A few things I genuinely don't know:

- Camunda is politically load-bearing. Management is attached to it, and frankly it's the only real monitoring and reprocessing capability we have today. "Just drop Camunda" is easy to say but harder when the devs opens Cockpit every morning to unblock orders.

- What did you replace it with that gave you equivalent visibility and reprocessing, not just equivalent orchestration?

- What are the pitfalls I'm not seeing? The ones that only show up 8 months in.

- Strangler fig where's the first cut?

TL;DR: 8-year-old OMS, 24+ microservices that only talk through a central Camunda orchestrator, one god model, one database, shared libs everywhere, a Python tool to coordinate cross-repo releases. Want to rewrite but Camunda is politically load-bearing. How would you tackle this? What pitfalls am I missing ?

reddit.com
u/Ralphoa — 2 days ago
▲ 33 r/softwarearchitecture+2 crossposts

Hexagonal Architecture - Ports

Hi.

I'm learning about Hexagonal Architecture and have some questions about where the in-ports (use case interfaces) and out-ports (repository interfaces) should be placed.

I've read various blogs, articles, and discussions where some people mentioned that ports must be located in the application layer, and others said they must be in the domain layer. I'm confused about the right place to put them.

I'd like to know your opinions and suggestions, please.

Approach 1 - Ports inside application layer

application/
- in/ (use case interfaces)
  - CreateProductUseCase     
- out/ (repository interfaces)
  - ProductRepository
- usecase/
  - ProductUseCaseImpl

domain/
- model
 - Product (plain object)

Approach 2 - Ports inside domain layer
domain/
- in/ (use case interfaces)
  - CreateProductUseCase     
- out/ (repository interfaces)
  - ProductRepository
- model
 - Product (plain object)

application/
- usecase/
  - ProductUseCaseImpl
reddit.com
u/Quick-Resident9433 — 2 days ago

Repositories vs Gateways

I've been studying different architectures like Clean Architecture & Hexagonal Architecture. I noticed that use cases define gateways for data access. This helps separate implementation details from business logic. However, in clean architecture codebases I've found in Github, most data access seems defined by Repositories, which usually live within the Domain. I understand that this seems to be related to Domain Driven Design principles, but I'm just wanting some guidance on the similarities or differences between the two.

In, DDD, do repositories represent data access interfaces, or am I mistaken. Can a CA codebase utilize both Gateways and Repositories?

See the following screenshots from the book for reference.

https://preview.redd.it/l6yozt6bp0wg1.jpg?width=3024&format=pjpg&auto=webp&s=11efe27b5468ad78dafb6d5a271bd0739cb9d494

https://preview.redd.it/j4kcm250p0wg1.jpg?width=3024&format=pjpg&auto=webp&s=feab2d260e33d30cb83ced888891a45c2b998a02

reddit.com
u/Informal_External_55 — 3 days ago
▲ 3 r/softwarearchitecture+1 crossposts

Looking for architecture review & improvement suggestions for my auth project

I recently completed a home assignment for a company and built this project: https://github.com/pirate329/auth

I’d really appreciate some feedback on the architecture and overall design. I’m also looking for suggestions on improvements or enhancements that could make this a stronger portfolio project.

Any insights, critiques, or ideas are welcome thanks in advance!

I have used claude (sonnet 4.6) for writing readme.

reddit.com

Source about solved problems faced by big tech companies

Hi everyone, there was a website, which listed huge architectural problems faced by big tech, and their solutions to them, what was it?

reddit.com
u/Open_Channel_2100 — 2 days ago

Calling persistence port from REST controller in hexagonal arquitecture

I have been using hexagonal since 1 year and a half, working with a fantastic tech lead who teach me very good fundamentals (that's at least what I believe), but there is something that I learnt with him but I do not agree.

In GET, DELETE & even in POST & PATCH endpoints with no business logic to be implemented, just normal CRUD with no validations or other stuff, we are calling persistence ports directly from REST controller, instead of calling the usecase port and call the persistence port from there (because for him this is "dead code", just created to be like a bridge, with no real logic implemented). From my perspective, although I get his point, I think this violates the SoC principle.

Just in case you wanna understand better what I try to say, I have this project made by me, in which I violate the SoC principle too xd.

Am I wrong, is my TL wrong or this is not black or white and maybe we both have something?

Thank you for reading! Any comment would be welcome!

github.com
u/DragonIsKuina — 3 days ago
🔥 Hot ▲ 51 r/softwarearchitecture

Event-first architecture

Have you guys ever considered NOT doing database writes in your customer facing http endpoints, where you instead chose to only emit an event (which you then caught in another endpoint in your app where you then finally did the db write)?

Normally you have an endpoint like POST /api/todo and in that endpoint you'd typically do something like INSERT todo into table. But what if you instead sent an event "todo.created", and then in another endpoint in your app POST /events/todo you inserted into the table.

Because then the db wouldn't be the source of truth anymore. In a regular event driven architecture the db is still the source of truth and events are emitted with an outbox.

With this Event-first pattern the db of your application is down stream just like the DB's of all your other services (that also are interested in the event).

If you persist the events and the order in which they came in then the obvious benefit becomes replayability, you can rebuild your db or bootstrap any new service with it's own interpretation of the event log data.

Is it possible to keep the db of the source application accurate if you keep it consistent by using events, without fully committing to event sourcing complexity?

reddit.com
u/neoellefsen — 4 days ago
▲ 10 r/softwarearchitecture+1 crossposts

Part 2 & 3: Zero Secrets and Zero Trust on GKE (PCI-DSS follow-up)

Posted Part 1 last week around cluster hardening for a PCI-DSS setup on GKE.

Just finished Part 2 & 3 this time focusing on two areas that seem to break most “compliant” setups in practice:

  • removing secrets from workloads entirely (workload identity instead of keys/env vars)
  • locking down service-to-service communication (default deny + mTLS + identity-based access)

One thing that stood out while going deeper into this: a hardened cluster doesn’t really mean much if

  • pods still carry credentials
  • or everything inside the cluster can talk freely

That’s usually where the real risk is, not the perimeter.

Trying to map this more to how it would actually be implemented in a real fintech environment, not just audit checklists.

Part 2 & 3 here:
https://medium.com/@rasvihostings/building-a-pci-dss-compliant-gke-framework-for-financial-institutions-1d1f2c003622

Curious how others are approaching this in real setups:

  • Do you enforce default-deny network policies cluster-wide?
  • Anyone running strict mTLS everywhere, or is it usually partial?

Feels like this is where most setups drift away from what zero trust is supposed to be.

reddit.com
u/gringobrsa — 3 days ago
🔥 Hot ▲ 69 r/softwarearchitecture

System Design: Designing an online auction at 50K bids/sec

Online auctions look simple until correctness problems appear.

Two users click "Bid $105" at the same moment when the current price is $100. Both pass the "$105 > $100" check. But both cannot be accepted.

The post covers the areas that needed the most thought:

  • Per-auction serialization using Valkey single-threaded Lua, instead of row locks that can queue under load on a busy auction.
  • Effectively-once settlement using a fencing token and a stable idempotency key at the payment provider. One common mistake is putting the token inside the idempotency key, which can cause duplicate charges during retries.
  • Anti-sniping built into the same atomic script that accepts the bid, because handling it separately creates the race again.
  • Proxy bidding with a jump-to-price algorithm, so two competing auto-bidders can resolve in one write instead of many small increments.
  • WebSocket fan-out for 1M concurrent watchers using Valkey sharded pub/sub, with presence cleanup and sequence-number resume after reconnect.

It also covers the simpler Postgres-only version, which works well up to roughly 500 bids/sec. Many systems may never need the larger scaled-out design.

Link: https://crackingwalnuts.com/post/online-auction-system-design

reddit.com
u/Few_Ad6794 — 4 days ago

Event-driven architecture for scalable dashboards — does this approach make sense?

Hey folks, I’d like to get some feedback on an architecture approach for dashboards in systems that need to scale in terms of read load and data volume.

We’re moving away from a model where most things are computed in real time directly on raw data, and considering a more decoupled setup. The idea is roughly:

  1. domain services persist data as usual

  2. in the same operation, they also write an event to an outbox table/collection

  3. a worker publishes those events to a broker (e.g., Kafka)

  4. independent consumers build read models/projections optimized for queries

5.dashboards read from those projections instead of recalculating everything on demand

We’re also considering reprocessing projections (by scope) in case of bugs or changes in business logic.

Main questions:

  1. Is this kind of setup (outbox + event-driven projections + read models) a common pattern for scalable dashboards?

  2. Does it make sense to separate write and read concerns like this even if the system isn’t super complex yet?

  3. Have you seen simpler approaches work well in similar scenarios?

  4. At what point would you consider this overengineering?

Just trying to sanity check if we’re heading in a reasonable direction or missing something simpler/more standard.

reddit.com
u/Jer3mi4s — 3 days ago

Best way to build offline-first attendance system (PWA + GPS)?

Hey devs,

I’m working on an attendance module for an ERP project and could really use some advice from people who’ve built similar systems.

The idea is to make it offline-first, since a lot of users will have poor or no internet.

What I have so far:

- React PWA with IndexedDB

- Service Worker for background sync

- Django + PostgreSQL backend

What it needs to do:

- Punch in/out with GPS + face photo

- Work fully offline and sync later

- Track location periodically during work hours

Where I’m stuck:

- Offline sync:

Not sure what’s the best way to handle conflicts and retries without creating duplicate records. Also, should I trust device time or server time?

- GPS tracking:

Worried about battery drain and also how to deal with fake/spoofed locations.

- Scaling sync:

What happens when a lot of users come back online at the same time and all try to sync?

- Face capture:

Is storing photos enough for attendance, or is it better to go for face recognition?

If you’ve worked on something like this, I’d love to hear what worked (and what didn’t). Any tips or pitfalls to avoid would really help.

Thanks 🙏

reddit.com
u/LastNothing2610 — 1 day ago
🔥 Hot ▲ 213 r/softwarearchitecture

A Better (Beyond CRUD) Architecture

I see this as a good beginning style for any project. Unless it’s a trivial brochure website, your software will likely get bigger but not remain simple at the same time.

I present this, not as anything radically new, but as a foundation so that software folk no longer think too much in terms of CRUD.

In particular, I would like to point out the Infrastructure layer that I think should always be wrapped. It’s the most troublesome layer to deal with when things change. By using interfaces that are specific to your particular business needs you hide away the infrastructure (or included packages).

I should stress that the Domain layer should also only express business needs. In short “if it’s not talking about a specific business need that you’re referencing somewhere, drop it”. Getters and setters are a prime violation of this.

By designing your software in terms of Domain or Infrastructure you’ll have an easier time. Then coordinate it with the Application layer.

u/rmb32 — 6 days ago

Seeking advice on chasing a Solutions Architect career path

Hello everyone, i'm a software engineer with about 1 year of experience, working mainly in the Agentic AI space though my work leans more toward backend and system than the AI/ML side itself.

Over this past year, I've realized something about myself: I genuinely enjoy researching new ideas, concepts, and thinking about architecture. I look forward to my code review sessions not because of the code itself, but because of the discussions around why things were designed a certain way. That kind of thinking energizes me more than the actual implementation work.

After some research, Solution Architect seems like a role that aligns well with what I enjoy and moving from developer to SA is one recognized path. But I don't have a clear picture of what that journey actually looks like end to end.

So I'd love to hear from those of you with more experience:

  1. Should I focus on deepening my engineering skills first, then wait patiently and look for opportunities at my current company to make architectural decisions and gradually grow into the title?
  2. Or is it better to first target an intermediate, more client-facing role before chasing SA and do roles like that actually exist?
  3. And roughly, what kind of timeline should I be aiming for at each stage?

I really appreciate any thoughts or advice you're willing to share.

reddit.com
u/LuckHorror8748 — 4 days ago