Designing the data infrastructure for my org - looking for feedback
I’m currently working as a data/analytics engineer at a small/mid sized manufacturing company where I’ve more or less been tasked with building out our data platform from scratch. Also I’m the first data hire so I literally have no one to turn to in the company. This /r has been my guidance for everything lol. I’m also someone who does not have a lot of experience in DE. So I’m learning everything and then implementing it.
We’re still pretty early in our data maturity, with a lot of siloed systems (CRM, ERP, Sharepoint lists, etc.), so the goal has been to create something scalable but not overly complex.
Right now, I’ve set things up in Microsoft Fabric using a medallion-style architecture:
- Bronze (central lakehouse): raw data ingested from various APIs with minimal transformation
- Silver (central processing layer): cleaned and standardized using a config-driven pipeline
- Gold (department-level warehouses): business-ready tables in separate workspaces for different teams (Sales, Ops, etc.)
On top of that, I’m using workspace isolation so each department has its own workspace for reporting and access control, while keeping bronze and silver in a central workspace.
A lot of this is still evolving (e.g., handling schema changes, thinking about incremental loads vs overwrites, optimizing compute usage in Fabric, etc.), and I’m trying to strike a balance between doing things “right” and not over-engineering too early.
Curious to hear from others who’ve built something similar:
- Does this architecture make sense at this stage?
- Anything you’d strongly recommend changing early before it becomes painful later?
- How would you approach scaling this (especially around governance, costs, and team autonomy)?
Appreciate any thoughts or critiques.