u/FantasticEquipment69

Hello guys, I have worked on DWH architectures, but I've never worked on a Lakehouse (might be obvious from the question).

Might sound like a dumb question for many of you, but I wanted to ask some of you who have real-life experience with Lakehouses (or even Theoretical knowledge).

In a Lakehouse environment, do you usually schedule your Jobs like in a DWH environment (daily batch loads) and your ODS reads directly from the source systems (using CDC)? Or do you prefer real-time Bronze Layer and the ODS reads from it?

My opinion was ODS reading from the source (like a normal DWH architecture), as it should be:

  • less computing (you will only load the ODS in real-time)
  • less delay (no middle layer dependencies)
  • In case of any variances in the silver/gold layer, you still have the same data in Bronze Layer for validation, fixes and reload.

The other opinion with ODS reading from the bronze layer was actually AI opinion, but I thought it might be depending on something previously shared, so I wanted to understand if there are more advantages to real-time Bronze Layer and the ODS reading from it.

reddit.com
u/FantasticEquipment69 — 16 days ago