u/sathvikchava

▲ 4 r/dataengineering+1 crossposts

New data eng team modernising messy legacy workarounds onto ADF + Databricks + ADLS + Fabric — how do we build this properly from the start?

I am a data engineer with zero experince in a new data engineering team at our org.

Stack: ADF + Databricks + ADLS Gen2(medallion), serving into Microsoft Fabric.

Our work primarily focus on migrating badly built legacy ETL systems to cloud, also on boarding new sources (emailed xlsx/csv files, SQL servers , SAP, third-party ads and sales APIs) into our data space.

The environment I'm working in:

  • No proper requirement gathering- most of the things a with half info requireing a lot of back and forth communication.
  • Everthing has to be built form scarcth - so, there are no standards or best practices set
  • No project planning - each project is a single jira ticket
  • Architecture is just like "here you go - databricks and ADF - use it"
  • There is one senior DE but the above but manager doesnot want him to be smart and impactfull - because they wanna secure thier manager layer

I want to grow and learn as a data engineer - both technically and also handling the process side

Would love advice on:

  • How do you deal with unclear requirements and no direct stakeholder access — what do you do before writing a single line of code?
  • What standards or practices are worth pushing for early in a new DE team?
  • Best practices for this stack and multi-source ingestion (APIs, SAP, SQL, flat files)
  • How do you make good architecture decisions when there's no proper design stage?
  • Resources that taught you to think like a proper data engineer, not just use the tools

Happy to hear from anyone who's been in a "building the plane while flying it" situation. What helped you most?

reddit.com
u/sathvikchava — 5 hours ago