Designing a unified lakehouse across five source systems accumulated over fifteen years and three acquisitions. Phased delivery to minimise operational risk during a sensitive integration period.
When five source systems describe the same underlying business in five different ways, the temptation is to pick one as the authority and force the others to conform. This is faster. It is also wrong.
Each source system encodes decisions that mattered to the team that built it. Flattening them into a single canonical view loses information that is irrecoverable.
The harder path: design a curated layer that holds the disagreement explicitly, and force the organisation to make the harmonisation decisions consciously, one at a time, with the people who know what each definition actually means.
There is a kind of integration work where speed is a virtue. There is another kind, the kind we usually do, where speed is the enemy.
We phased this engagement over four releases not because the technology required it, but because the operational risk required it. Every additional month of phasing bought a margin of safety we could not have purchased any other way.
The diagram is simple. The path through it took nine months. Most of that time was not spent in the boxes; it was spent in the arrows between them, making sure each transition was honest.
The work that mattered most in this engagement was the listening in the first six weeks: who built what, why, and what had broken. The architecture was a consequence of that, not a substitute for it.
Every source system is a record of decisions made by people under real pressure. Understanding those decisions, in the language of the people who made them, is the work. The Databricks instance is a downstream artefact.
We do not begin engagements by drawing diagrams.