New data eng team modernising messy legacy workarounds onto ADF + Databricks + ADLS + Fabric — how do we build this properly from the start?
I am a data engineer with zero experince in a new data engineering team at our org.
Stack: ADF + Databricks + ADLS Gen2(medallion), serving into Microsoft Fabric.
Our work primarily focus on migrating badly built legacy ETL systems to cloud, also on boarding new sources (emailed xlsx/csv files, SQL servers , SAP, third-party ads and sales APIs) into our data space.
The environment I'm working in:
- No proper requirement gathering- most of the things a with half info requireing a lot of back and forth communication.
- Everthing has to be built form scarcth - so, there are no standards or best practices set
- No project planning - each project is a single jira ticket
- Architecture is just like "here you go - databricks and ADF - use it"
- There is one senior DE but the above but manager doesnot want him to be smart and impactfull - because they wanna secure thier manager layer
I want to grow and learn as a data engineer - both technically and also handling the process side
Would love advice on:
- How do you deal with unclear requirements and no direct stakeholder access — what do you do before writing a single line of code?
- What standards or practices are worth pushing for early in a new DE team?
- Best practices for this stack and multi-source ingestion (APIs, SAP, SQL, flat files)
- How do you make good architecture decisions when there's no proper design stage?
- Resources that taught you to think like a proper data engineer, not just use the tools
Happy to hear from anyone who's been in a "building the plane while flying it" situation. What helped you most?
u/sathvikchava — 5 hours ago