If all you have is a hammer…

Claudia Imhoff raises an important issue in her blog regarding the cleansing of data. When dealing with a data warehouse it is important for data to be validated before being loaded into the warehouse in order to remove any data quality problems (of course, ideally you would have a process to go back and fix the problems at source also). However, as she points out, in some cases e.g. for audit purposes, it is actually important to know what the original data actually was, not just a cleansed version. This issue gets at the heart of a vital issue surrounding master data, and neatly illustrates the difference between a master data repository and a data warehouse.

In MDM it is accepted (at least by those who have experience of real MDM projects) that master data will go through different versions before producing a “golden copy”, which would be suitable for putting into a data warehouse. A new marketing product hierarchy may have to go through several drafts and levels of sign-off before a new version is authorised and published, and the same is true of things like drafts of budget plans, which go through various iterations before a final version is agreed. This is quite apart from actual errors in data, which are all too common in operational systems. An MDM application should be able to mange the workflow of such processes, and have a repository that is capable of going back in time and tracking the various versions, not just the finished golden copy. A good MDM repository should allow you to track back through master data as it is “improved” over time, not just look at the golden copy. The golden copy only should be exported to the data warehouse, where data integrity is vital.

People working on data warehouse projects may not be aware of such compliance issues, as they usually care only about the finished state warehouse data. MDM projects should always be considering this issue, and your technology selection should reflect the need for your MDM technology to track versions of master data over time.

3 thoughts on “If all you have is a hammer…”

  1. Hi Andy – I have a set of thoughts I’ve put together about MDM and search technology and the BI market space. I don’t see a posting by you that these thoughts make the most sense to use as a comment.

    Is it possible to send something directly to you?

  2. Spot on! The transactional data is fact; a transaction happened on a certain date and time and that is that. But the way in which that transaction is classified can change: the hierarchy of products which involved the transaction, the segmentation of a customer, the organisational unit which owned the transaction. Hence the combination of the transaction and the history of master data is sufficient for a true enterprise “memory” in a way in which ERP systems, which concern themselves with up to the minute data, are not. Only by having the history of how master data changes can you reconstruct things, which is why just archiving transactions is insufficient. This realisation is in fact at the heart of the way that Kalido stores data, making explicit the separation of transaction data from master data, which is never deleted, merely reclassified and time-stamped. Kalido’s design allows the recasting of history using versions of master data, and indeed without such versioning ability most transactional archives are actually pretty useless as a system of record; it is only the combination of transactions with the master data that was in place at the time that gives a complete picture.

    As to the very good question: “why?” this can be for compliance reasons, but that is a cop out. For other examples, consider comparing profitability of a subsidiary before and after an acquisition, or the performance of a business unit before and after a reorganisation. There are many situations where you want to compare “apples with apples” but the business situation changed on you part way through the reporting period. If you have the record of that business change then you can recast data to take account of the changes. A marketer may want compare last summer’s sales with this summer’s, yet there was a reorganisation at year end in between. Anything that requires an analysis of trends over time will typically hit the issue of master data changing during the course of the reporting period.

    Paul, yours was a really perceptive comment.

  3. Hi Andy,

    I’ve read your posting and Claudia’s and am struggling to understand the answer to a basic question: why?

    I can come up with two business reasons why I need to keep the transactional detail:

    1 – an audit may require me to show what “was” from a number of years ago
    2 – the business might change their reporting requirements and I can’t necessarily create new aggregates from the old ones – the only way to do new math is from the transactional information. I think that may be part of the historical perspective of previous “gold masters”

    How does this let a company either a) make money or b) save money?

    From a technical perspective maybe what makes sense is to have the only physical content as my transactional information and the gold master as a logical view on top of it. As gold masters change I can always go back in time and view it as it “was.”

    This could potentially also allow organizations to position the system with all the compiled transactional data as the true system of record instead of the ERP or CRM systems. Turning the system that contains all the transactional data with the logical view(s) into a true trusted hub of information. You’d You’d really want to position this system of record as having open published interfaces and direct connection to service buses for connectivity back to any other system.

    Anyway, hope you don’t mind the comments/thoughts/feedback.

Comments are closed.