An article in CIO today has one good nugget and then sets out to miss the main point entirely. The author correctly says that “Working with data across an enterprise — especially in an SOA environment — requires understanding its context and semantics, not just its format and field attributes”. Spot on. However he then drifts off to bemoan the general state of the metadata repository market. What is missed is that the solution to master data management can never be a passive repository of the type that organizations put in during the 1990s e.g. the Reltech and Brownstone products later bought by CA, or even the IBM attempts prior to this e.g. the IBM Data Dictionary, which I had fun using in the 1980s. Such initiatives fail for several reasons:
- The metadata that is captured tends to be technical metadata e.g. “CUST is VARCHAR(8)” rather than business metadata: “Coca cola is a product within the product class carbonated drink, which is in turn within the product group beverage”. Technical matadata is of limited use.
- If the metadata is not linked back into real operational systems, it will become as out of date as the repositories of a decade or more ago
- The tools tend to use arcane conventions which business users have trouble relating to, do this job is left to IT folks, who aren’t really in the best position to know what the data rules really are.
- There is no one single definition of almost any master data in a company; instead there are many, and they need to be managed.
It is critical for the success of any master data (or metadata) project that the people who actually understand the master data like general ledger structures, customer segmentation, product hierarchies i.e. the business people who set them up, are put in charge of owning the core definitions, including the process to update this data. With this is mind there is a data governance process required in addition to any software tool. Such software should ideally be able to deal with business modeling in a way that is more intuitive than E/R diagramming, which most business users have trouble with.
Next, there needs to be automated workflow to support this update process, which may be complex e.g. there may be several draft versions of a new product hierarchy needed, with different groups of people who have to review and approve before final publication. If this just happens by email then errors will occur.
The master data repository then needs the capability of “semantic integration” i.e. enabling the storage of multiple versions of the various types of master data that actually exist today, so that it can be at the hub of any project to improve the quality of this data, which may involve some rationalization.
Having understood what is out there, modeled it, mapped together the different versions and defined the workflow needed to deal with update, the master data project then needs the ability to hook up to messaging technology to actually drive changes made to the master data repository back out into the other operational systems like ERP and CRM. Without such integration it will only be a partial solution.
Not many, perhaps any, vendors today offer a complete solution to the above set of issues, but some come close, and not one of them is a traditional repository vendor or an ETL tool.