A Halloween Tale

It’s a busy week in the master data management world, with big scary monsters out in the night and eating up smaller prey. We have seen Tibco acquire Velosel, and just today SAP acquire moribumd EII vendor Callixa, apparently for its “customer data integration efforts”. I’m not quite sure what potion SAP have been imbibing recently, but I could have sworn that they recently abandoned their own MDM offering, which after two years of selling into their massive user base had managed just 20 sites, and bought vendor A2i in order to replace this gooey mess with a new master data management offering based on A2i’s technology. Perhaps those with crystal balls available as part of their costume for their Halloween party this evening could inquire through the mists as to how buying a second vendor in the space matches up with the coherent vision of master data management that it is presumably trying to portray? At the moment this seems as clear to me as pumpkin soup.

Every vendor worth its salt now seems to be under the MDM spell, with hardly a week going by without a niche player getting gobbled up by one of the industry giants. Yet I continue to be surprised by the disjointed approach that many have taken, tackling two of the two most common key areas: customer and product, with separate technology. Sure, CDI and PIM grew up independently, but there are many, many other kinds of master data to be dealt with in a corporation e.g. general ledgers, people data, pricing information, brands and packaging data, manufacturing data to name just a few. One of our customers, BP, uses KALIDO MDM to manage 350 different types of master data. Surely vendors can’t really expect customers to buy one product for CDI, another for PIM, another for financial data, another for HR etc? This would result in a witches brew of technology, and most likely a mess of new master data technologies which in themselves will need some kind of magic wand waving over them in order to integrate the rival master data technologies. Just this nightmare is unfolding, with the major vendors each trying to stake out their offering as being the one and true source of all master data, managing all the other vendors’ offerings. I certainly understand that if any one vendor could truly own all this territory then it would be very profitable for them, but surely history has taught us that this simply cannot be done. What customers want is technology that allows master data to be shared and managed between multiple technology stacks, whether IBM, SAP, Oracle, Microsoft or whatever, rather than being forced into choosing one (which, given their installed base, is just a mirage anyway). Instead the major vendors seem to be lining up to offer tricks rather than treats.


The data warehouse carousel

Rick Sherman wrote a thoughtful article which highlighted a frustration amongst people working in the business intelligence field. He says that “Many corporations are littered with multiple overlapping DWs, data marts, operational data stores and cubes that were built from the ground up to get it right this time” – of course each one never quite achieving this nirvana. This never ending cycle of replacement occurs because data warehouses build around conventional design paradigms fundamentally struggle to deal with business change. Unlike a transaction system, where the business requirements are usually fairly stable (core business processes do not change frequently) and where the system is usually aimed at one particular part of the business, a data warehouse gathers data from many different sources and its requirements are subject to the whims of management, who change their mind frequently about what they want. Any major business change in one of the source systems will affect the warehouse, and although each source system change may not happen very often, if you have ten or fifty sources, then change becomes a constant battle for the warehouse. One customer of ours had a data warehouse that USD 4 million to build (a bit larger than the average of USD 3M according to Gartner) and was a conventional star schema, built by very capable staff. Yet they found that the system was costing USD 3.7 million per year in maintenance, almost as much as it cost to build. They found that 80% of this cost was associated with major changes in the (many) source systems that impacted the schema of the warehouse. It is hard to get reliable numbers for data warehouse maintenance, but systems integrators tell me that support costs being as high as build costs is quite normal for an actively used data warehouse (ones with very low maintenance costs tend to get out of sync with the sources and lose credibility with the customers, eventually dying off).

This problem is due to the conventional way in which data models are put together and implemented at the physical level, whereby the models are frequently aimed at dealing with the business as it is today, with less thought to how it might change. For example you might model an entity “supplier” and an entity “customer”, yet one day one of these suppliers becomes a customer. This is a trivial example, but there are many, many traps like this in data models that are then hard-coded into physical schemas. This fundamental issue is what led to the development of “generic modeling” at Shell in the 1990s, which itself was contributed to the ISO process and became ISO 15926. This is very well explained in a paper by Bruce Ottmann, the co-inventor of generic modeling, and is the approach used in the implementation of the KALIDO technology (and a few other vendors). The more robust approach to change that this advanced technique allows means a huge difference to ongoing maintenance costs. Instead of a warehouse costing as much to maintain as to build, the maintenance costs reduce to around 15% of implementation costs, which is much more acceptable. Moreover the time to respond to changes in the business improved dramatically, which may be even more important than the cost.

Whatever the technology used to actually implement, it would be well worth your while understanding the concepts of the generic approach, which leads to the creation of more robust and higher quality data models.



Love amongst vendors is fickle

Informatica just announced a partnership with SAP. This illustrates just how difficult it can be for software vendors to develop lasting mutually beneficial relationships, since SAP had exactly such a relationship with Ascential (presumably this is now moot). Reportedly, Ascential got a lot less from this relationship than they had originally hoped, so it remains to be seen whether Informatica will do any better. Just to add to the haze, SAP has been stating recently that its Netweaver brand of technology will do ETL functions itself. So what is a customer to do: continue with Ascential, switch to Informatica, or wait for Netweaver to provide functionality?

I would suggest that the correct response is: “continue whatever you are doing”. If you were using Ascential, then it is perhaps less comforting that SAP no longer promotes this, but the technology clearly works the same both pre and post press release. If you were using Informatica anyway then nothing changes, while waiting for Netweaver to do ETL for you and so obviate the need for a separate tool is perhaps the only dubious choice. SAP’s record with products not at the heart of their application platform expertise is patchy at best. SAP first tried a BI product (actually more than one) in the mid 1990s, then replaced this set with LIS and other tools, then switched yet again to BW and Business Explorer. SAP’s MDM product did not exactly set the world alight. Despite heavy marketing through its vast installed base, it turns out that just 20 customers had deployed SAP MDM by mid 2005 (revealed at an SAP conference). 20 out of an installed base of 18,000 customers is about 0.1% penetration after two years of trying. SAP has since dropped this and bought A2I, and will rebuild a new MDM offering around that, which is another hard lesson to customers that buying from big vendors is by no means always “safe”.

So customers with ETL needs and no committed product yet should just evaluate Informatica and Ascential Datastage (now an IBM product) and let them slug it out. These two vendors emerged as the leaders in the ETL market, with previous pioneers like ETI Extract shrinking to near oblivion, and Sagent disappearing entirely. Only the eccentric and secretive company Ab Initio has retained a niche in high volume ETL, though since customers have to sign an NDA just to peek at the product it is hard to know much about their market progress.

IBM’s relationship with SAP also continues ambivalently. IBM makes a stack of money from services around SAP implementation, and gets some database revenue if the customer runs SAP on DB2, so in principle their relationship should be on solid footing. Yet SAP’s Netweaver announcement put this (including its XI technology) smack up against IBM Websphere in the middleware space, a core business for IBM, who have eschewed the business applications market. The path of true love is a rocky one.

A truly sincere form of flattery

It always pays to keep up with what the opposition is up to, and so I tuned in to a webinar two days ago run by Hyperion on their MDM solution, which they acquired from Razza. This competes with KALIDO MDM, and although no doubt both vendors would argue about the relative merits of their products, these two technologies are perhaps the most deployed general purpose MDM tools today. The interest/hype about general purpose MDM is intense but the number of production instances at real customers is relatively small as this is a very new and rapidly developing market. It is a market long on Powerpoint and short on deployed product. The session was led by Jane Griffen of Deloittes, an expert in MDM and who speaks regularly about it. However when it came to a case study about using MDM in practice, I was amazed to see a Kalido customer case study slide pop up (though, this being a Hyperion event, Kalido’s name was curiously omitted). This creates a new level of tech marketing illusion, with endless potential. Tell people about your product, but show them customer case studies from someone else. This is a great way of plugging holes in your own customer list, and perhaps also deal with any pesky product limitations your own tool has. After all, if there is some area of weakness in your own product, just pick a competitor stronger in that area and discuss one of their customer implementations, implying that it is one of your own.

This has set a precedent in truly original hi-tech marketing. Perhaps one could skip building a product at all, and just present about other people’s customers entirely? It would certainly cut down on R&D costs. Congratulations to Hyperion on inventing an entirely new technique in marketing.

Road testing software

Fortune 500 companies have surprisingly varied approaches to procurement of software. Of course the sheer size of the project or deal is an important factor, with the chances of professional procurement people being wheeled in rising as the deal value rises. Having been on both sides of the fence now, I do have some observations.

Some customers use an RFI (request for information) as a way of trying to help their own understanding of their problem, and this approach can lead to problems. If you are not quite sure what you need then you can be certain that a software vendor has less idea. Moreover, if your needs are vague, you can be sure that every vendors’ product will mysteriously fit these vague needs. It is better to sit down with your business customers and get a very firm grasp of the precise business needs, and then plan out how you are going to assess the software, before you speak to a single vendor. You should plan in advance just how you are going to select the product from the “beauty parade” of vendors that you will eventually pick from. It is important that you think about this PRIOR to talking to vendors, or your process will be tainted.

How are you going to pick a provider? Just bringing them in and seeing who presents well is unlikely to be optimal, as you are relying here too much on impressions and the skill of the individual sales teams. Do you want the best product, or the one with the slickest salesman? Instead you should define up from the list of functional, technical and commercial criteria that will frame your choice, and agree a set of weightings i.e. which are most important. You then need to think how you are going to go about measuring these in each case e.g. what does a score “8/10” mean for a particular criteria. Some things you can just look up e.g. many commercial criteria can be established from the internet (revenues of public companies) or via things like Dun and Bradstreet risk ratings. Analyst firms can help you short-list options, but be aware that analyst firms take money from vendors as well as customers. A key bit of advice here is not to go mad with the criteria – remember that you are going to have to score these somehow. Moreover, do a light check first to get the “long list” of vendors down to a short list before you delve too deeply. I know of a UK NHS trust who have a project going on right now with literally hundreds of criteria, and a “short list” of 22 vendors. How on earth they are planning to score these is a mystery to me. Slowly is presumably the answer. Get it down to three or four vendors via a first pass.

Once you have your short-list, a key part of this process is likely to be getting the vendor to actually try the software out on your own data in your own environment. Just because it all works fine in a different industry, platform and size of company to you does not mean it will all go smoothly in your environment, and you should conduct a “proof of value” for each of the short listed vendors. You will learn far more from seeing the software actually operate on your data than via any number of pretty Powerpoint slides and carefully crafted canned demos. Be reasonable here. A vendor selling software for a six figure sum will be prepared to put in a day or two of pre-sales effort, but if you expect a massive multi-week evaluation then you should expect to pay for some consulting time, either from the vendor or a consulting firm who are deeply experienced in the technology. Buying a piece of enterprise software is a major decision, with costs well beyond the basic purchase price of the software, so investing a little up-front in order to be sure you have made the right choice is a good idea. If you choose the proof of value carefully, then you can get a head start on the real business problem by tackling a small subset of it, and you may well learn something about the real implementation issues through a well structured proof of value activity. The vendors, one of which will be your future partner after all, will also be happy since they get to understand your requirement better and hopefully determine any technical horrors at this stage rather than much later on. It is amazing how often you encounter “you want it to run on what database?” type of basic issues at this stage. It is in your interest to make sure that the proof of value is realistic e.g. decent data volumes, and using the actual environment that you plan to deploy on. We have recently had problems with a project where the customer did all the testing on one web app server (Tomcat) and then deployed into production on an entirely different one and were surprised when it didn’t work first time (“but both web servers adhere to the same standard so it should work”; yeah, right).

Customer references are more important that they may appear. It may be surprising following a slick sales pitch, but small vendors in particular may have very few real customer implementations, and you can learn a lot about what a vendor is really like from customers who have gone past the pitch and actually implemented the product. Of course the vendor is not going to pick his unhappiest customer to do the reference, but most people are fairly honest about experiences if you ask. Even large vendors may have very few implementations of this PARTICULAR product, so size is not everything, as with so many things in life. I remember when I was working for a major corporate and a huge computer manufacturer were trying to sell me a big ticket application, but could not come up with one reference customer at all. This told me all I needed to know about the maturity of this technology.

A well structured evaluation process and proof of value does cost some effort up-front, but it will pay dividends in terms of demonstrating whether a technology is likely to actually do what you want it too and deliver value to your project.

Elusive Return on Investment

An article in CIO magazine revealed a fascinating paradox. 63% of IT executives claimed that they are required to present a cost justification for IT projects (a survey by Cutter) yet according to Aberdeen Group, just 5% of companies actually collect ROI data after the event to see whether the benefits actually appeared. I have to say that my own experience in large companies bears out this “ROI gap”. Most post implementation reviews occur when a project has gone hideously wrong and a scapegoat is required. There are exceptions – I have been impressed at the way that BP rigorously evaluate their projects, and Shell Australia used to have a world-class project office in which IT productivity was rigorously measured for several years (a sad footnote is that this group was eventually shut down to reduce costs). However, overall I think these apparently contradictory survey findings are right: a lot of IT projects have to produce a cost/benefit case, but hardly ever are these benefits tested.

It is not clear that the failure to do so is an IT problem, but rather a failure of business process. Surely the corporate finance department should be worried about this lack of accountability – it is hardly IT’s problem if the business doesn’t bother to check whether projects deliver value. It really should not be that hard. Project costs (hardware, software, consultants, personnel costs)are usually fairly apparent or can be estimated (unless it would seem, you work in government) while benefits are more slippery. This is mainly because they vary by project and so don’t fit into a neat template. However they will usually fall into the broad categories of improved productivity (e.g. staff savings), improved profitability e.g. reduced inventory, or (indirect and woollier) improved customer value e.g. the falling price of PCs over the years. It should be possible to nail down estimates of these by talking to the business people who will ultimately own the project. Once these benefits have been estimated then it is a simple matter to churn out an IRR and NPV calculation – these are taught in every basic finance class, and Excel conveniently provides formulae to make it easy. Of course there are some IT projects that don’t require a cost-benefit case: regulatory being one example (“do this or go to jail”) but the vast majority should be possible to justify.

By going through a rigorous analysis of this type, and then checking afterwards to see what really happened, IT departments will build credibility with the business, something that most CIOs could do with more of.

Perception over Reality

A recent article in Infoconomy wonders whether SQL Server 2005 will be “enterprise scale”. Wake up call – it already is. It is intriguing that analysts and journalists continue to perpetuate the myth that SQL Server is somehow a departmental solution, not really up to serious usage. This is nonsense; even in 1997, when working at Shell, I led a piece of research cost of ownership of database platforms and was not surprised to find that the DBA facilities of SQL Server were much easier to use than those of Oracle. What I was surprised to discover was just how large some SQL Server implementations were. Of course SQL Server was originally based on the Sybase code base, which to this day runs many of the systems at giant financial institutions, but somehow the myth of “departmental” had stuck in my mind. Recently one of our customers has been looking seriously at switching from Oracle to SQL Server and has tried running some of its largest applications on it. A trial of a multi terabyte Kalido data warehouse, for example, showed that SQL server was slightly slower on some things and faster on others, but broadly speaking there was no performance disadvantage to Oracle. SAP runs happily on SQL Server, so why does the myth persist?

I think it comes down to marketing dollars. Microsoft does not market SQL Server heavily to the enterprise, and spends less money with analyst firms than one might expect. By contrast Oracle and IBM are marketing machines who constantly stress how powerful their databases are. Hence a marketing perception, unchallenged, becomes perceived wisdom. Microsoft is missing a trick here, as Oracle behaves very aggressively to its customers and will not win many popularity polls amongst its large corporate customers. Some would be very tempted to switch to an alternative supplier, and while switching costs would be huge, part of the reason for the inertia is this perception of SQL Server as a departmental database. Given that IBM was outmarketed by Oracle when DB2 had a clear technical lead, it would be a shame to not see Microsoft put up more of a fight in this arena – competition is good for everyone except monopolistic vendors.

MDM Market size

An IDC survey puts the MDM market at USD 10.4 billion by 2009, with a compound growth rate of 13.8%. To save you some Excel exercise, that puts the existing MDM market at USD 5 billion today. This seems quite high, and clearly involves counting the existing sub-markets of CDI (customer data integration) and PIM (product informaton management) which have been around a lot longer than general purpose MDM tools (e.g. Kalido MDM, Oracle Data Hub, Hyperion Razza and SAP MDM or whatever it is called this week). Certainly the latter tools are what has stirred up a lot of interest in the MDM market, since they promise to address the general problem of how to manage master data, rather than dealing with the speicifc point problems of customer and product. This is important, since BP uses Kalido MDM to manage 350 different types of master data, so if we have to have a different software solution for every data type then IT departments are going to be very disappointed indeed! In fact today there are relatively few deployed instances of general purpose MDM tools, which are quite young (KALIDO MDM went on general release in Q3 2004) so it is of great interest to those vendors as to how quickly the “general purpose MDM” market will pick up from its early beginnings to grow to a serious proportion of this overall market. IDC are the most quantitative and thorough of the industry analysts when it comes to market size data, though as ever caution should be used in projecting the future in a straight-line form. Still, even at the current market size of USD 5 billion, it can be seen that this market, which did not even really have a name a year or so ago, is generating a lot of interest.

IT Industry perking up

There have recently been a couple of useful indicators of the health of the technology industry. A somewhat lagging indicator is the new Software 500 list, which shows 2004 revenues, but for reasons best known to itself comes out in September. This shows overall revenues up 16% in 2004 compared to 2003. The software 500 is a slightly odd list in that includes assorted technology companie and systems integrators like Accenture rather than just pure play software companies, but it is otherwise a useful list. More recently, the FT reported that M&A activity in the European technology industry was at a record high, even more than in the previous record Q1 2000 quarter – there have now been ten straight quarters of M&A growth. This would suggest that, while the IPO market remains lacklustre unless you are called Google or have “China” in your name (just 64 technology POIs in the first nine months of 2005 v 367 in the same period in 2000) , the executives in the industry itself see opportunities in investing in technology companies, or at least see some bargains on offer.

These are welcome signs after the nuclear winter for technology of late 2001 and 2002, with 2003 only a little better. Customer IT spending is still frugal compared to the late 1990s, but at least some projects are now getting financed. Technology executives will hope to see conditions continue to improve after some lean years.

How much data is there?

Bob Zurek rightly questions a statement from Eric Schmidt at Google that “there are 5 million TB of data in the world”. A study by the University of California at Berkeley estimated that there were 1,500 million TB of data produced in the world just in 1999.

Another interesting question is how much structured information there is in large companies. If you restrict the issue dramatically, to structured data on servers (not PCs) in large corporations, it is still a large number. With the largest data warehouses in the world weighing in these days at around 100 TB, and 1 TB data warehouses being quite commonplace, even if each Global 5,000 company had just 10 TB of total data, then we have a number 50,000 TB. This excludes the vast number of companies in the world that are below the Global 5,000. Certainly very large companies in industries like retail, Telco and retail banking will certainly have hundreds of terabytes of structured data each. Those with long memories will recall the puny disk drive sizes that we used to deal with even on mainframes in the 1980s, which does make you wonder, given that all those big companies worked just fine then. As they say about processors, Intel giveth, and Microsoft taketh away….. – perhaps there should be a similar truism for disk storage?