Graph Theory approach to reason about complex Master Data models
To be more than a database of Master data, MDM architecture should account for non-obvious and complex interdependencies of master data elements by allowing attribute values to be inherited from predecessor node, propagate values to the successor nodes or set independently of predecessor or successor in the dependency graph. Graph theory representation of master data as set of vertices and edges <V,E> helps to reason completely and rigorously about the impact and rules that govern how change in one part of graph effect the entire graph and also to reason about how two graphs can be merged in case when they are detected to be duplicate of the same entity in real world.
The diagram below serves as a kitchen sink example of how various entities like Person, Account, Address can be linked by a complex set of edges and a scenario of two graphs that needs to be merged while maintaining data integrity assumed by various applications that both mutate and query master data. The same approach can be extended to model even more complex scenarios that include additional vertices like individual preferences and helps to visualize the combinatorial complexity the grows exponentially as more layers of vertices gets added to the model
Having a visual representation that captures all the essential complexity helps to enumerate different possibilities like : if vertex values need to be inferred or maintained independently and helps facilitate an enterprise view of the whole web of master data and standardize the business logic of maintaining data dependencies and integrity in MDM solution.
Wrong assumption that tools and technology is a silver bullet and substitute for business process analysis and harmonization is common reason for failure of MDM initiatives. Developing a solution with out complete understanding of the problem leads to incoherent system that would lead to same chaos that MDM solution attempted to solve in the first place.