Manufacturing · Lakehouse · Databricks

A listed automotive components manufacturer.

Designing a unified lakehouse across five source systems accumulated over fifteen years and three acquisitions. Phased delivery to minimise operational risk during a sensitive integration period.

Sector
Automotive components
Scale
Listed (JSE)
Platform
Databricks on Azure
Period
Nine months
i · The integration question
What this engagement asked

When five systems describe the same business in five different ways.

When five source systems describe the same underlying business in five different ways, the temptation is to pick one as the authority and force the others to conform. This is faster. It is also wrong.

Each source system encodes decisions that mattered to the team that built it. Flattening them into a single canonical view loses information that is irrecoverable.

The harder path: design a curated layer that holds the disagreement explicitly, and force the organisation to make the harmonisation decisions consciously, one at a time, with the people who know what each definition actually means.

A canonical schema is an organisational artefact dressed up as a technical one.
ii · On phasing
Where speed becomes the enemy

Some integration work rewards slowness.

There is a kind of integration work where speed is a virtue. There is another kind, the kind we usually do, where speed is the enemy.

We phased this engagement over four releases not because the technology required it, but because the operational risk required it. Every additional month of phasing bought a margin of safety we could not have purchased any other way.

Fig. 01 · ArchitectureERPMESQUALITY DBOT STREAMCRMBRONZELANDSILVERCANONGOLDSERVEBIMLSOURCESBRONZESILVERGOLDCONSUMERSCANONICALSCHEMA

The diagram is simple. The path through it took nine months. Most of that time was not spent in the boxes; it was spent in the arrows between them, making sure each transition was honest.

iii · What this work asks
The discipline beneath

Lakehouse projects fail more often from organisational debt than from technical debt.

The work that mattered most in this engagement was the listening in the first six weeks: who built what, why, and what had broken. The architecture was a consequence of that, not a substitute for it.

Every source system is a record of decisions made by people under real pressure. Understanding those decisions, in the language of the people who made them, is the work. The Databricks instance is a downstream artefact.

We do not begin engagements by drawing diagrams.