Tackle master data to achieve SOA dream

Dana Gardner makes a sensible point regarding SOA, which I would amplify. In order for the promise of SOA to come to fruition, the discussion needs to move beyond the user interface and all those sexy Web 2.0 applications, and towards dealing with the barriers to applying SOA to general business applications. The thorny issues of dealing with inconsistent data models, and of addressing data quality seem rarely to come up in discussion, yet these are critical to the broader success of the SOA dream. It is tricky enough to get an enterprise application up and running, but if you want to be able to have different applications from different vendors interacting happily with one other then it is not just a matter of application layer standards; for business applications these need to be able to share data as well. Unfortunately the reality in large companies is that there are multiple, incompatible, versions of key business definitions such as “product”, “customer”, “asset”, “supplier” and even seemingly unambiguous terms like “gross margin” (the average large company has 11 systems that each think they are the master source of “product”, for example, according to a survey a couple of years ago. All the elegant user interface design in the world is of limited help if two applications cannot agree on what to display when they are supposed to be showing a product hierarchy, or where the definitions of cost allocations differ.

This is why projects and tools that can enable the improvement of the management of key business terms, such as “master” data like “product”, “customer” etc are a necessary precursor to broader SOA deployment in mainstream business applications, as are improvements in data quality, another area that is far from sexy but in practice is a major issue whenever applications have to bring together data from multiple sources, as any data warehouse practitioner can testify. Dealing with the “semantic integration” of the varying business models that exist out there in multiple ERP, CRM and supply chain systems is a major barrier to broad SOA adoption, yet it is scarcely mentioned in the technology media. When those first SOA prototypes start showing data to business users, and it is realized that the data is “wrong”, this topic will become much higher profile.

Another one bites the dust

I had written quite recently about the consolidation occurring in the data quality industry. The pace picked up today, when First Logic, having escaped the clutches of Pitney Bowes, looks as if it will be acquired by Business Objects.

This acquirer is a lot more logical for First Logic than Pitney Bowes. Data quality is a major company of any BI/data warehouse implementation, and indeed Business Objects has already been reselling First Logic for over a year, so the two companies already know each other. The bargain basement price (USD 69M purchase for a company with revenues of over USD 50M) tells you all you need to know about the health of the data quality market.

This move supports my thesis that data quality as an independent market is essentially disappearing, with the capabilities being baked into broader offerings. I believe the same fate awaits the ETL market; more on this later.

Missing the Boat

One of the things that has bewildered me over the last year or two (and there are plenty of things that bewilder me) is how the data quality vendors have seemed oblivious to the emerging trend of master data management (MDM). On the face of it, there are few sectors more in need of a fillip. Data quality, which involves a lot of human issues such as data governance, and getting business people involved, is a hard sell. Rooting out errors in data is hardly the sexiest area to be working in, and as the solution is only partially provided by technolopgy, projects and initiatives here are prone to failure (human nature being what it is). The space has seen significant consolidation on recent years: Avellino was bought by Trillium, Evoke was bought by Similarity systems, Vality by Ascential (now IBM), Group 1 also by Pitney Bowes, who also made an abortive attempt to buy First Logic (if you can figure that strategy out, answers would be gratefully received), while Trillium is owned by Harte Hanks. Now Similarity systems has in turn been acquired by Informatica. Not the sign of a flourishing sector.

Surely then data quality vendors should have seized on MDM like a drowning man would at a life-raft? Data quality issues are a significant element of master data management, and while having software that can match up disparate name and address files is a long way from having a true MDM offering, remember that this is the tinseltown world of high-tech marketing, where a product can morph into another field with just a wave of a Powerpoint wand. Data quality vendors certainly ought to have grasped that matching up disparate definitions of things like “product” and “customer” was at least related to what their existing offerings did, and could have launched new MDM-flavored offerings to pick up on the coat-tails of the nascent but burgeoning MDM bandwagon. Instead there hasn’t been a peep, and vendors have resigned themselves to being picked off by in some cases somewhat odd acquirers (Pitney Bowes, for example, is a direct mail firm; does it really grasp what it takes to be an enterprise software vendor?). Having avoided the clutches of Pitney Bowes, First Logic is now making progress in talking about MDM, but it is not perceived by the market as an MDM vendor. Elsewhere in the data quality industry, the silence around MDM is deafening.

As the data quality market essentially disappears into the portfolios of integration companies like Ascential (now IBM) and Informatica (which at least make logical sense as buyers), and assorted others, the executives of some of these companies surely must be wondering whether they missed a trick.

Data Quality is so retro darling

Mark Smith (CEO of research company Ventana) writes about data governance as an important area for companies. He rightly points out that technology solutions are only part of the answer, and that organizational issues are at least as important. Strikingly, just 26% of large companies include master data management in their data governance initiatives. Perhaps some of this gap is in terminology, but this is somewhat disturbing since it raises the question: exactly what numbers are most companies using to make their decisions?

From my recollection of working in two of the largest companies in the world, it was best not to dig too deeply into the management information figures much of the time; data quality was a perennial problem, as was the endless “my figures are different to your figures” discussions. As Mark Smith points out, a lot of the issue is getting the ownership of data firmly with the business. Shell carried out an excellent initiative in the late 1990s in defining a common business model (at least down to a certain level of detail) and getting some business ownership of this business model, but even after this it was still a major challenge. It is certain that other large companies have the same issues. What is clear is that data quality and ownership of definitions cannot be an IT issue; it is critical that the business people step up and take control over their data, since they are the ones best placed to understand inconsistencies and problems that someone who has an IT background may overlook.

A good thing about the emerging interest in master data management is that it highlights previously neglected issues in the “data quality” field, that was previously a tough sell internally. Hands up all those volunteers for a data quality project? It was never what you might term a fashionable subject. Yet a lot of issues in MDM are actually related to data quality, so perhaps now that MDM is trendy we can dust off some of those old data quality books and make some better progress than occurred in the 1990s.

MDM Trends

In DM Review we see some “2006 predictions”, something that journalists cannot resist doing each January, whatever the subject. In this case the article seems curiously limited to commonest about “customer”. Certainly customer is an important example of master data, and indeed there are several products out there that specialize in this (so-called CDI products, like DWL, recently bought by IBM). However it is a common misapprehension that MDM is just about “customer” and “product”. It is not. One of our customers, BP, uses KALIDO MDM to manage 350 different types of master data, of which just two are customer and product. Large companies also have to worry about the definitions of things like “price”, “brand”, “asset”, “person”, “organization”, “delivery point”, etc, and probably don;t want to buy one software product for each one.

MDM, as an emerging area, is particularly tricky to make predictions about. For what it is worth, I predict that in 2006:

1. There will be several more acquisitions in the space, as large vendors decide that they need to have an offering of some kind, if only to fend off competitive criticism or gaps on RFI checklists. However, caveat emptor here. The better products, like Trigo, have already been snapped up.
2. At least one analyst firm will publish some form of “MDM architecture” diagram that will attempt to classify MDM into different areas, in order to try and elevate that firm’s perceived “thought leadership” on the issue.
3. There will be the first “MDM project disaster” headlines as early adopters begin to move from Powerpoint into more real project implementations. Inevitably, some will not go according to plan.
4. SAP MDME will prove as problematic as the original SAP MDM, which is down pushing up the daisies in a software cemetery near Walldorf. A2i was a poor choice as a platform for a general purpose MDM tool that SAP needs, and this realization will start to sink in when customers start to try it out.
5. Management consultancies, who until mid 2005 could not even spell master data management, will establish consulting practices offering slick Powerpoint slides and well-groomed bright young graduates to deliver “program management” around MDM, with impressive looking methodologies that they are so hot off the presses that the ink is barely dry.
They will purport to navigate a clear path through the maze of MDM technologies and will certainly not, in any way, be learning on the job at the client’s expense.

Hype or Hope?

In an article today Peggy Bocks of EDS asks whether MDM is all hype. I think it is fascinating how some terms in technology catch on, while others wither away. In 2002 MDM was essentially an unknown term. I recall discussing with analysts calling our new Kalido product offering “Kalido MDM” and being greeted with polite derision, since this was “not a market”. Customers seemed to recognize the problem though, and I discovered that we were not alone when SAP launched their own SAP MDM offering soon afterwards. Though that product is now retired, when a vendor the size of SAP anoints a term then you can be sure that the industry will take it more seriously than when a start-up uses it. Three years on and there is a much higher level of noise around MDM, with around 60 vendors now claiming to have some sort of MDM offering, at least in Powerpoint form. Moreover further validation has come from Oracle (with its Data Hub), Hyperion (who bought Razza) and IBM, who have bought several technologies related to MDM. IDC reckon the market for MDM will be worth USD 9.7 billion within fove years.

However, I do have some sympathy with Ms Bock’s point – a lot of MDM at present is froth and discussion rather than concrete projects. Certainly Kalido has some very real MDM deployments, Razza has some, Oracle presumably does, and SAP managed about 20 deployments before giving up and starting again, buying A2i and retiring its existing offering. Still, outside the tighter niches of product information management and customer data integration (arguably a subset of the broad MDM market) this hardly constitutes a landslide of software deployments.

A skeptic would argue that the industry has got these things wrong before, getting all excited over technology that fizzles out. Remember how “e-procurement” was going to take over the world? Ask the shareholders of defunct Commerce One about that trend. I recall IDC projecting some vast market for object databases in a report they published in 1992, and at the time I wrote a paper at Shell arguing that the ODBMS market would likely never get even close to a billion dollars, and indeed it never did. However object databases always struck me as a solution in search of a problem, whereas the issues around managing master data are very real, and very expensive, for large companies, and they are not well addressed today. There are major costs associated with inconsistent master data e.g. deliveries being delivered wrongly, duplicate stock being held etc. Shell Lubricants though they had 20,000 unique pack-product combinations when in fact they had around 5,000, meaning major savings to be had through eliminated duplication in marketing, packaging and manufacturing, for example.

Because it addresses a real business problem, with the potential for significant hard business savings, I believe that the MDM market will in fact catch light and grow, but there will inevitably be a confusing period while analysts get their heads around the new market and start to segment it, and customers begin to understand the various stages they need to go through in order to run an effective master data project.

MDM Business Benefits

There were some interesting results in a survey of 150 big-company respondents conducted by Ventana Research as to where customers saw the main benefits of master data management (MDM). The most important areas were:

  • better accuracy of reporting and business intelligence 59%
  • improvement of operational efficiency 27%
  • cost reduction of existing IT investments 8%

It is encouraging that respondents place such a heavy emphasis on business issues compared to IT, since quite apart from this sounding quite right (MDM can improve customer delivery errors, billing problems etc) they will have a much better chance of justifying an MDM project if the benefit case is related to business improvement than the old chestnut of reduced IT costs (which so rarely appear in reality – surely IT departments would have shrunk to nothing by now if all the projects promising “reduced IT costs”over the years had actually delivered their promised benefits). A nice example of how to justify an MDM project can be found in a separate article today, in this example specifically about better customer information.

The survey also reflects my experience of the evolution of MDM initiatives, which tend to start in a “discovery”phase where a company takes stock of all its master data and begins to fix inconsistency, which initially impact analytic and reporting applications. Later, after this phase, companies begin to address the automation of the workflow around updating master data, and finally reach the stage of connecting this workflow up to middleware which will physically update the operational systems from a master data repository. This last phase is where many of the operational efficiency benefits will kick in, and these may be very substantial indeed.

Based on the rapidly increasing level of interest in MDM, in 2006 I expect to see a lot of the current exploratory conversations turning into more concrete projects, each of which will need a good business case. At present MDM projects tend to be done by pioneering companies, so it will be very interesting to see if the various projections prove accurate and MDM starts to become more mainstream.

The supply chain gang

There is a thoughtful article today by Colin Snow of Ventana in Intelligent Enterprise. In it he points out some of the limitations today in trying to analyze a supply chain. At first sight this may seem odd, since there are well established supply chain vendors like Manugistics and I2, as well as the capabilities of the large ERP vendors like SAP and Oracle. However, just as with ERP, there are inherent limitations with the built-in analytic capabilities of the supply chain vendors. They may do a reasonable job of very operational level of reporting (“where is my delivery”) but struggle when it comes to analyzing data at a broader perspective (“what are my fully loaded distribution costs by delivery type”). In particular he hits the nail on the head as to one key barrier: “Reconciling disparate data definitions”. This is a problem even within the supply chain vendors’ software, some of which have grown through acquisition and so do not have a unified technology platform or single data model underneath the marketing veneer. We have one client who uses Kalido just to make sense out of data within I2’s many modules, for example.

More broadly, in order to make sense of data across a complete supply chain you need to reconcile information about suppliers with that in your in-house systems. These will rarely have consistent master data definitions i.e. what is “packed product” in your supply chain system may not be exactly the same as “packed product” in you ERP system, or within your marketing database. The packaged application vendors don’t control every data definition within an enterprise, and the picture worsens if the customer needs to work with external suppliers more closely e.g. some supermarkets have their inventory restocked by their suppliers when stocks go below certain levels. Even if your own master data is in pristine condition, you can be sure that your particular classifications structure is not the same as any of your suppliers. Hence making sense of the high level picture becomes complex since it involves reconciling separate business models. Application vendors assume that their own model is the only one that makes sense, while BI vendors assume that such reconciliation is somehow done for them in a corporate data warehouse. What is needed is an application-neutral data warehouse in which the multiple business models can be reconciled and managed, preferably in a way that allows analysis over time e.g. as business structures change. Only with this robust infrastructure in place will the full value of the information be able to be exploited by the BI tools.

Data quality blues

A 2005 research note from Gartner says that more than 50% of data warehouse projects through 2007 will be either outright failures or have limited acceptance. This does not surprise me in the least. There are several aspects in which data warehouse projects are under unusual strain, as well as the normal problems that can beset any significant project. Data warehouses take in data from several separate data sources (ERP, supply chain, CRM etc) and consolidate it. Consequently they are dependent upon both the quality of the data and the stability of those source systems: if any of the underlying source systems has a major structural change (e.g. a new general ledger structure or customer segmentation) then it will affect the warehouse. You might think that data quality was a minor problem these days with all those shiny new ERP and CRM systems, but you’d be wrong. In Kalido projects in the field we constantly encounter major data quality issues, including with data captured in the ERP systems. Why is this?

An inherent problem is that systems typically capture information that is directly needed by the person entering the data, and other things as well, which are useful to someone, but not directly to that person. I remember doing some work in Malaysia and seeing a row of staff entering invoice data into a J.D. Edwards system. I was puzzled to see them carefully typing in a few fields of data, and then just crashing their fingers at random into the keyboard. After a while, they would resume normal typing. After seeing this a few times my curiosity got the better of me and I asked one of then what was going on. The person explained that there were about 40 fields that they were expected to enter, and many of the fields were unnecessary they could not move to the next screen without tabbing through each field in turn unless they entered some gibberish in one of the main fields, at which point the system conveniently took them to the last field. So by typing nonsense data into a field that turned out to be quite relevant (but not to them) they could save lots of keystrokes.

Of course this is an extreme case, but have you ever filled out an on-line survey, got bored or frustrated because they are asking something you don’t have the data for and started just answering any old thing just to get to the end? The point is that people care about data quality when they are going to get something back. You can be sure they enter their address correctly on the prize draw form. But in many IT systems people are asked to enter information that doesn’t affect them, and human nature says that they will be less accurate with this than something which does mean something directly to them. Some data quality issues can be dramatic. In the North Sea one oil company drilled through an existing pipe because it was not there according to the system that recorded the co-ordinates of the undersea pipes: this merely cost a few million dollars to fix, as fortunately the pipe was not in active use that day or the consequences would have been much worse. Another company discovered that it was making no profit margin on a major brand in one market due to a pricing slip in its ERP system that had gone undetected for two years.

The reason data warehouses suffer so much from data quality issues is that they not only encounter the data problems in each of the source systems they deal with, but because they also bring all the information together the data problems often only becomes clear at this point e.g. the problem with the pricing became apparent because the data warehouse showed that they were making zero gross margin, which was not apparent inside the ERP system since the margin calculation was made up of data from several systems combined. It is the data warehouse that shines light on such issues, but often is wrongly blamed when the project is delayed as a result.

Data quality problems are one major issue where there is no magic solution. Data quality tools can help, but this is a people and process issue rather than a technology issue. However another reason data warehouse projects are perceived to fail is that they take a long time, and cost a lot to maintain. Since it takes 16 months to build an average data warehouse (TDWI) survey) it is not surprising that some changes to the business occur in this time. The only way to really address this is to use a packaged data warehouse solution, which takes less time to implement (typically less than 6 months for Kalido). Maintenance costs are another major problem, and here again there are modern design techniques that can be applied to improve this situation. See my blog the data warehouse carousel.

It is only by making use of the most modern design approaches, iterative implementation approaches that show customers early results, and the most productive technologies that data warehouse project success rates will improve. There will always be projects that run into trouble due to poor project management, political issues and lack of customer commitment, but data warehouse projects at least need to stop making life harder for themselves than they need be.

A truly sincere form of flattery

It always pays to keep up with what the opposition is up to, and so I tuned in to a webinar two days ago run by Hyperion on their MDM solution, which they acquired from Razza. This competes with KALIDO MDM, and although no doubt both vendors would argue about the relative merits of their products, these two technologies are perhaps the most deployed general purpose MDM tools today. The interest/hype about general purpose MDM is intense but the number of production instances at real customers is relatively small as this is a very new and rapidly developing market. It is a market long on Powerpoint and short on deployed product. The session was led by Jane Griffen of Deloittes, an expert in MDM and who speaks regularly about it. However when it came to a case study about using MDM in practice, I was amazed to see a Kalido customer case study slide pop up (though, this being a Hyperion event, Kalido’s name was curiously omitted). This creates a new level of tech marketing illusion, with endless potential. Tell people about your product, but show them customer case studies from someone else. This is a great way of plugging holes in your own customer list, and perhaps also deal with any pesky product limitations your own tool has. After all, if there is some area of weakness in your own product, just pick a competitor stronger in that area and discuss one of their customer implementations, implying that it is one of your own.

This has set a precedent in truly original hi-tech marketing. Perhaps one could skip building a product at all, and just present about other people’s customers entirely? It would certainly cut down on R&D costs. Congratulations to Hyperion on inventing an entirely new technique in marketing.