Structural Correction

Chapter 12: Data as Outputs, Not a Shared Substrate

Treating data as a shared substrate destroys the autonomy that process ownership requires. Data contracts allow each unit to own its data while publishing a versioned specification of what it provides to the reporting layer.

A data analyst in the central reporting team at a Nordic insurer spent three weeks building the quarterly capital allocation report for the CFO. It required revenue-per-customer figures across Motor, Property, and Commercial; he pulled the numbers from each division's data warehouse, and they did not reconcile. Motor counted a customer as a policyholder, Property as a household, Commercial as a legal entity with an active contract in the trailing twelve months, so a single corporate client with a fleet policy, a building policy, and a liability policy might appear as three customers, one, or zero depending on renewal dates.

He spent four days in meetings with divisional data owners trying to agree a common definition. They could not, because each definition was correct within its own process. He built a reconciliation layer in a spreadsheet that mapped between the three using manual rules, with an “assumptions” tab that ran to forty rows. The report shipped two weeks late, and the CFO used it to allocate € 12 million in discretionary capital across the three divisions. The numbers he compared were built on three incompatible definitions of the unit they measured. Nobody told him, and nobody needed to: the report looked precise because it inherited the precision of the source systems, which agreed on the number of decimal places but not on what they were counting. The CFO allocated € 12 million on numbers that nobody could reconcile. The board implication is simple: a shared data model is a shared point of failure. The aspiration is legitimate: the organisation should answer questions that span multiple processes, but the mechanism is wrong. When data is shared through a common substrate, every unit that reads from or writes to it is coupled to every other, and the “single source of truth” becomes a single point of coupling. Data should flow out of autonomous units as a published product, not pool underneath them as a shared substrate.

Data is at least two structurally different things, and conflating them is the origin of most data governance failures. Operational data lives inside the unit and serves the process it owns; each unit models the entities it works with in the way that serves its process, and the apparent duplication is precision, not waste. Analytical data is the cross-cutting view the organisation needs for reporting and decision-making, where data from multiple units is brought together. Data contracts bridge the two, rather than a shared database. The unit publishes its analytical data through a contract and the reporting layer consumes it; when the unit changes its internal structures it continues to honour the contract, and when the contract itself must change, the change is versioned and communicated, as an API version change would be. The unit's internal data stays private: no other unit reads its database, and no reporting pipeline reaches into its internals. The mechanism is analogous to financial reporting, where each business unit produces its own management accounts and the group consolidation function combines them according to defined rules, with no unit reaching directly into another's ledger.

The autonomy test is whether a unit can change its internal data model and publish a new contract version without a meeting. The claims unit at a mid-sized insurer needed to add a fraud risk score to its analytical output. Under the existing governance model, the request went to the data architecture steering group, which requested impact assessments from five consuming teams, two of which required follow-up meetings, and the change was approved nine weeks later with a condition that the field name conform to the enterprise data dictionary. The elapsed engineering time was half a day. Under the contract model, the unit publishes a new version of its data contract with the field as optional, the previous version continues to be honoured, consumers subscribe on their own schedule, and the change is live by Wednesday afternoon. If a change like this requires a steering group, the organisation has centralised governance described in different language, not data contracts. Centralised models optimise for reporting coherence at the expense of change velocity; distributed ownership optimises for change velocity; distribution with contracts balances both, at the cost of maintaining the contracts, a cost lower than the coordination centralisation requires.

Process-owned units and published contracts produce the data architecture as a by-product: if units own processes and publish contracts, the data flows naturally; if the organisation tries to fix its data architecture without fixing its ownership structure, it produces another layer of governance on top of the existing dysfunction.

A year later at the Nordic insurer, the analyst works differently. Each division publishes its customer count through a data contract: Motor policyholders, Property households, and Commercial legal entities with active contracts. The definitions have not changed, because they were never wrong. What has changed is that the mapping between them is published alongside the data: a versioned specification of how the three reconcile to the group reporting definition. The forty-row assumptions tab is now a maintained contract, tested automatically before the quarterly report is assembled. The report ships on time, and the € 12 million allocation is built on numbers whose reconciliation logic is explicit, inspectable, and owned. Data governance is an ownership problem, not a data problem: fix the ownership and the data follows; fix the data without fixing the ownership and the fix does not hold.

...

Continue reading in the interactive reader

Read this chapter

← Chapter 11 Chapter 13 →