kapernikov.com
Rail Infrastructure Junction

GIT for Master Data

Software developers solved this problem 20 years ago. Why are we still treating asset data like it's 1995?

Every infrastructure manager has the same problem: multiple systems, multiple teams, and everybody thinks their data is correct. GIS says the switch is at kilometre 42.3. SAP says 42.5. The maintenance crew says it moved last week. Who's right?

The Old Approach: Last Write Wins

Most asset databases work like a shared spreadsheet. Someone updates a field, and that value replaces the old one. No history. No context. No way to know why it changed.

This creates predictable chaos:

  • Teams don't trust the "official" system, so they keep their own shadow databases.
  • Conflicting updates overwrite each other silently.
  • Nobody can answer: "What was the state of the network on March 15th?"
  • Audit trails are incomplete or missing entirely.

The result: your "single source of truth" is actually a battleground where the last person to hit save wins.


What Software Developers Figured Out

In the 1990s, software development had exactly the same problem. Multiple developers editing the same codebase. Conflicting changes. Lost work. Then Linus Torvalds built Git.

Git's core insight: changes are first-class citizens. Every modification is recorded as a "commit" with a timestamp, an author, and full context. You don't just see "the code"—you see the complete history of how it got there.

Git's Killer Features

  • Full lineage: Every line of code traces back to when, why, and by whom it was written.
  • Merge conflict resolution: When two people edit the same thing, the system flags it instead of silently overwriting.
  • Branching: Work on future changes without breaking the current state.
  • Time travel: Reconstruct any historical state instantly.

Today, no serious software project runs without version control. The idea of 50 developers editing a shared folder seems insane. Yet that's exactly how most infrastructure managers handle their asset data.


Applying Git Principles to Master Data

Asset master data isn't source code. You can't just run "git commit" on your GIS database. But the principles transfer:

1. Record Changes, Not States

Instead of storing "Switch X is at coordinate Y", store "On January 15th, System A reported that Switch X is at coordinate Y." Every observation is a delta—a change record with metadata.

This changes everything. You no longer ask "what is the coordinate?" You ask "what do different sources say about the coordinate, and when did they say it?"

2. Embrace Multiple Sources

Traditional systems try to "own" the truth. They fight over which database is authoritative. A Git-like approach accepts that truth is distributed.

GIS contributes location data. SAP contributes cost data. The measurement train contributes geometry. Each source commits its observations. The system reconciles them based on clear rules—not whoever updated last.

3. Handle Conflicts Explicitly

When GIS says 42.3 and SAP says 42.5, that's not an error to hide. It's a merge conflict to resolve. Flag it. Investigate it. Document the resolution.

This is uncomfortable at first. You're surfacing disagreements that were previously invisible. But invisible disagreements don't go away—they just cause failures at the worst possible moment.

4. Enable Time Travel

"What was the state of track section Y on the day of the incident?" With traditional databases, this question triggers a forensic investigation. With full lineage, it's a query.

This isn't just useful for audits. It's essential for understanding how your network evolved, validating predictions, and debugging data quality issues.


The Architecture

Making this work requires a specific architecture:

  • 1
    Delta Streams: Every source system emits changes, not full exports. Each delta carries its source, timestamp, and context.
  • 2
    Identity Management: Cross-system IDs so you know that "Switch 1234" in GIS is the same physical asset as "WP-1234-A" in the maintenance system.
  • 3
    Survivorship Rules: When sources conflict, clear rules determine which value "survives" into the golden record—based on recency, source authority, or manual override.
  • 4
    Bi-temporal Storage: Track both "valid time" (when was this true in the real world?) and "transaction time" (when did we learn this?).

The result: a repository that's more than the sum of its parts. Not a static database, but a living record of how your knowledge evolved.


What This Enables

Auditable compliance: Regulators ask what you knew and when. You have the answer.

Trust across teams: When everyone's contributions are tracked and visible, the political fights over "whose data is right" diminish.

Quality feedback loops: You can measure which sources are reliable, which frequently get overridden, and where the real problems are.

Future-proofing: As new data sources emerge (AI detections, IoT sensors, contractor handovers), they plug into the same delta stream architecture.

Ready to Version Your Assets?

We've built this architecture for some of Europe's largest rail networks. The approach works.

Let's Talk