Software vendors can learn something from Eskimos

Now that master data management is gaining attention as an issue, it is interesting to observe the stances of the industry giants. As perhaps might be expected, each claims to have an all-encompassing solution to the issue (though they curiously had no product offering at all in this area two years ago, so presumably must count as quick learners) – all you have to do is adopt their middleware stack. Oracle have their DataHub, SAP have MDME or whatever it is called this week, IBM have an offering crafted by their acquisitions of DWL, Trigo and Ascential, Microsoft is well, Microsoft. All of them seem to be missing a key point. Intent on expanding their own infrastructure footprint at the expense of their rivals, they do not seem to grasp that large enterprise customers simply aren’t in a position to move wholesale to one middleware stack. Large global companies have SAP and IBM Websphere, and Oracle, and Microsoft and have a huge base of deployed applications using these, so any solutions which says: “just scrap the others and move to ours” is really not going to fly for the vast majority of customers.

By contrast what customers need is software that can, in a “stack neutral” way, deal with semantic inconsistency of their business definitions, whatever technology these definitions reside in. Surely by now it is clear after the billions of dollars spent on ERP that “just standardize” is simply a doomed approach for companies of any scale. Large companies can just about manage a common chart of accounts at the highest level, but soon as you drill down into the lower level details these definitions diverge, even in finance. In marketing (which by definition has to respond to local markets) manufacturing and others there is even less chance of coming up with more than a small subset of common high-level business definitions. Just as the Eskimos are said to have fifty-two words for snow, you;d be surprised at how many different ways large companies can describe a product, or even something apparently unambiguous like “gross margin” (26 different ways in one company I worked for). Hence you need technology that can help resolve the semantic differences here, and support the workflow required to allow maintenance. For example, DWL was strong at customer data integration at the detailed level, but many types of MDM problems require complex human interaction. For example a new international product hierarchy does not just get issued; there are versions of it, people need to review, modify and finally publish a golden copy. Most MDM tools today simply ignore this human interaction and workflow issue.

I think IBM have the best chance of figuring this out of the big four, simply because unlike SAP and Oracle they don’t have an applications business to defend, while Microsoft has never really figured out how to deal with the complex end of large enterprises. IBM’s acquisitions in this area may have been multiple but they have been shrewd. Ascential was the strongest technology of the ETL vendors, while DWL and especially Trigo were well respected. Ironically, IBM may need yet another leg to their strategy, since they too have yet to really address the “semantic integration” problem that is at the heart of MDM.

A big industry, but still a cottage industry

IDC today announced their annual survey results of the size of the data warehousing market. IDC sizes the overall market in 2004 at 8.8 billion. The “access” part of market e.g. Business Objects, Cognos, was USD 3.3 billion,”data warehouse management tools” (which includes databases like Teradata, and data warehouse appliances) was USD 4.5 billion Data warehouse generation software (which includes data quality) was sized at USD 1 billion. This was 12% growth over 2003, the fastest for years, and IDC expect to see compound annual growth of 9% for the next five years.

One feature of this analysis is how small the “data warehouse generation” part of the market is relative to databases and data access tools. It is in some ways curious how much emphasis has been on displaying data in pretty ways (the access market) and the storage mechanism (data warehouse management market) rather than how to actual construct the source of the data that feeds these tools. This is because today that central piece is still in the cottage industry stage of custom-build. Indeed with an overall market size of USD 35 billion (Ovum) it can be seen that the bulk of spending in this large market is still with systems integrators. Only a few products live in the “data warehouse generation” space e.g. SAP BW and Kalido (data quality tools should really be considered a separate sub-market). Hence the bulk of the industry is still locked in a “build” mentality, worrying about religious design wars (Inmon v Kimball) when one would have expected them to move into a “buy” mentality. This inevitably will happen, as it did to financial applications. Twenty or so years ago it was entirely normal to design and build a general ledger system, and who would do that today? As large markets mature, applications will gradually replace custom-build, but it is a slow process, as can be seen from these figures.

The average data warehouse costs USD 3 million to build (according to Gartner) and only a small fraction of this is the cost of software and hardware, the majority being people costs. It also takes 16 months to deliver (a TDWI survey) which is an awful long time for projects which are supposedly delivering critical management information. To take the example of Kalido, the same size project takes less than 6 months instead of 16 months, so for that reason alone people will eventually come around to buying rather than building warehouses. Custom data warehouses also have very high maintenance costs, which is another reason for considering buy rather than build.

The rapid growth in the market should not be surprising. As companies have bedded down their ERP, supply chain and CRM investments it was surely inevitable that they started to pay attention to exploiting the data captured within those core transaction systems. The diversity of those systems means that most large companies today still have great difficulty answering even simple questions (“who is my most profitable customer”, “what is the gross margin on product X in France v Canada”) which causes senior management frustration. Indeed a conversation I had at the CEFI conference this week with a gentleman from McKinsey was revealing. In recent conversation with CEOs he explained that McKinsey were struck by how intensely frustrated CEOs were at the speed of response of their IT departments to business needs, above all in the area of management reporting. 16 month projects will not do any longer, but IT departments have are still stuck in old delivery models that are not satisfying their business customers – the ones who actually pay their salaries.

Awash with appliances

It is interesting how success attracts competition. Teradata have built up a billion dollar business from selling high end hardware and proprietary database technology to handle extremely large transaction-based data warehouses, such as those occurring in retail, Telcos and retail banking. Netezza has done an excellent job of raising its profile as a start-up in clear competition to Teradata, while now there are even newer startups like DATAllegro, offering a data warehouse appliance in competition to both (with a new offering out today) and Calpont. This is a healthy sign in an industry that is undeniably very large (business intelligence is variously estimated to be USD 25-35 billion in size, though the vast bulk of this is consulting services) yet has remained extremely fragmented in terms of vendors. Software vendors, other than the DBMS vendors, are few and far between in the data warehouse space, since the industry is mostly locked into a custom build mentality, with Kimball v Inmon design religion wars being the order of the day. SAP have, after some false starts, brought their BW product to a wide (SAP) audience but, other than Kalido, there are few data warehouse software companies. Of course there are ETL vendors such as Informatica and Ascential (now bought by IBM) and the reporting tools of Business Objects, Cognos and Hyperion, but the data warehouse itself has lacked much in the way of software automation

Teradata have succeeded despite an apparently major obstacle: the highly proprietary nature of their offering. Large companies CIO departments generally loathe proprietary infrastructure, especially when they have just spent years trying to (just about) standardize on a particular database or hardware platform, so it is an uphill struggle for the appliance vendors. Red Brick briefly did well selling a database tuned for data warehouse applications, but eventually it could not shake off the idea that Oracle or IBM could just add a “star join” feature to their products and make it redundant. Hence it is to Teradata’s credit that they have maintained clear blue water between themselves and Oracle/IBM/Microsoft at the high end of large data warehouses. This in turn has created a market large enough to attract new entrants such as Netezza and DATAllegro, who can offer an easy to understand “like Teradata, but cheaper” message to customers who have giant transaction datasets to analyze but balk at Teradata’s high price tag and opaque pricing when it comes to maintenance payments. It will be very interesting to see whether IT departments will pass a blind eye over the proprietary nature of these offerings (after all, this objection was essentially what killed off object databases) in the way they have with Teradata, though rumor has it that Netezza at least is making good early progress.

Of course only a small subset of data warehouses have the kind of volumes and processing requirements that require such technology. A TDWI survey showed Teradata at just 3% market penetration of deployed data warehouse databases, but of course this is a very attractive 3%, with typical deals in the million dollar range. Teradata has managed to overcome the proprietary stigma that bedeviled object databases in the 1990s and carved out an attractive high end niche that Oracle etc seem unable to really compete with. Its challenge now is growth, with competitors like Netezza nibbling into its margins and general purpose databases that get more powerful with each release. However the boom in raw data e.g. RFID seems likely to mean there is plenty of demand yet for raw power.

Information Management Enlightenment

Gartner have recently been using the term “enterprise information management” as a blanket term to describe the technology and processes around a company’s efforts to control and best use its information assets. The term extends beyond structured data into text, and even to digital content such as movies or music. As they identify, a key to making progress in such a potentially monumental task is to resolve semantic inconsistencies across the technical boundaries. The problem will be familiar to anyone who has worked in a large company. A product code used in the ERP system has a different code in the CRM system, and a different one again in the manufacturing system. There are godo reaons why such differences have emerged. If we take a physical product, then a manufacturing group will care about the materials that go into that product, its manufacturing process, and perhaps health and safety information associated with it. From a distribution perspective, its dimensions are important e.g. how many will fit in a container. From a marketing viewpoint it is important to understand the branding used, the packaging (perhaps the product is marketed in different ways in different countries) and the pricing. Each business unit cares about certain aspects of the product, but has limited direct interest in other aspects, so it is hardly surprisng that the ways in which the product is classified are different depending on whether you have a manufacturing, distribution or marketing viewpoint. For example, even something familiar like a Big Mac actually has quite different recipes in different countries; in some cases it is no longer even made from beef (e.g. in India, where the cow is sacred). A branded automative lubricant will have quite different technical specifications if you buy a can of it in a hot country like Vietnam than if you buy apparently the same product in, say, Iceland.

These different perspectives cause a complex web of differing classifications and semantics to occur for products in a large enterprise, and it is similar story for other terms like “customer”, where again the key points of interest varies dramatically if you are a salesman from if you are trying to deliver a consignment to that customer, to whether you are trying to collect a payment from them. This is not just of academic interest: according to a Reuters survey, 30% of operational errors (e.g. incorrect deliveries) can be traced back to poor quality data. Those who have ever had the joy of trying to change your address on your bank or savings account will be familiar with the issue that your details do not just occur in one system only in many banks.

Trying to manage the various types of data in a large company is a mammoth task, and one which is on an uncertain footing since brands and customer details do not stay constant forever. This underlying patern of change means that initiative which seek to standardize (say) product codes “once and for all” are doomed to failure, because the things themselves are changing. However progress is possible. The first stage is to gain an understanding of the data across the company, then to describe the processes used to update this master data, and finally to bring automation to these processes. There is no single technology silver bullet since business processes are just as important as integration technology, but there a number of technologies do help matters e.g. data quality tools to help identify issues, emerging master data management products to assist with process automation, data warehouse technology to help understand and classify reference data, and EAI technology to actually link up and automate processes once they are under control. “Think big, start small” is the mantra, starting with a manageable scope and going through this process: identify data -> capture the processes -> automate the processes. Modern technologies that are better able to deal with change, along with universal access to the internet and so to applications that can automate workflow, are starting to make it possible to begin this enterprise information management journey. Confuscious said: “A journey of a thousand miles begins with a single step”, and companies can set about this journey with more confidence than ever before.

MDM Market size

An IDC survey puts the MDM market at USD 10.4 billion by 2009, with a compound growth rate of 13.8%. To save you some Excel exercise, that puts the existing MDM market at USD 5 billion today. This seems quite high, and clearly involves counting the existing sub-markets of CDI (customer data integration) and PIM (product informaton management) which have been around a lot longer than general purpose MDM tools (e.g. Kalido MDM, Oracle Data Hub, Hyperion Razza and SAP MDM or whatever it is called this week). Certainly the latter tools are what has stirred up a lot of interest in the MDM market, since they promise to address the general problem of how to manage master data, rather than dealing with the speicifc point problems of customer and product. This is important, since BP uses Kalido MDM to manage 350 different types of master data, so if we have to have a different software solution for every data type then IT departments are going to be very disappointed indeed! In fact today there are relatively few deployed instances of general purpose MDM tools, which are quite young (KALIDO MDM went on general release in Q3 2004) so it is of great interest to those vendors as to how quickly the “general purpose MDM” market will pick up from its early beginnings to grow to a serious proportion of this overall market. IDC are the most quantitative and thorough of the industry analysts when it comes to market size data, though as ever caution should be used in projecting the future in a straight-line form. Still, even at the current market size of USD 5 billion, it can be seen that this market, which did not even really have a name a year or so ago, is generating a lot of interest.

IT Industry perking up

There have recently been a couple of useful indicators of the health of the technology industry. A somewhat lagging indicator is the new Software 500 list, which shows 2004 revenues, but for reasons best known to itself comes out in September. This shows overall revenues up 16% in 2004 compared to 2003. The software 500 is a slightly odd list in that includes assorted technology companie and systems integrators like Accenture rather than just pure play software companies, but it is otherwise a useful list. More recently, the FT reported that M&A activity in the European technology industry was at a record high, even more than in the previous record Q1 2000 quarter – there have now been ten straight quarters of M&A growth. This would suggest that, while the IPO market remains lacklustre unless you are called Google or have “China” in your name (just 64 technology POIs in the first nine months of 2005 v 367 in the same period in 2000) , the executives in the industry itself see opportunities in investing in technology companies, or at least see some bargains on offer.

These are welcome signs after the nuclear winter for technology of late 2001 and 2002, with 2003 only a little better. Customer IT spending is still frugal compared to the late 1990s, but at least some projects are now getting financed. Technology executives will hope to see conditions continue to improve after some lean years.

SOA – Sounds-like Objects Again

Judith Hurwitz’s article today on SOA reminded me of at least two previous industry trends. I recall that analysts over a decade ago were predicting that “applets” were the way of the future. These mini-applications would allow customers to pick and choose a pricing routine from one vendor and a cost allocation routine from another and mix and match with impunity. This would allow a new range of innovative application vendors to bring solutions to market and let a thousand start-ups bloom. What did we get? SAP. For those who think it is different this time, let us all try and remember CORBA, which was another attempt to provide services that were application-neutral and would lead to a new set of standards-based applications. Seen too many of these recently?

The difficulty with such things is that the vision is always dragged down by the detail, and gaps in the offerings. In SOA, everything sounds good until you realise that there are no services for semantic reconciliation of data from these multiple sources, nor seemingly much in the way of a business inteligence layer. Worse, in order for people to actually build composite applications based on the SOA services, it will require all the current application vendors to meekly open up the guts of their products to allow composite apps to call them. Why exactly would they do this? Of course they will make defensive PR noises about being open, but the goal of application vendors is to sell as much of their own software as possible, not help someone else. Those with the largest entrenched installed base have the most to lose, so expect these vendors to start to offer their own “superior” form of web services, which will allow calls out from their own applications to “legacy” (i.e. anyone but them) applications, but don’t hold your breath for services going the other way. After all “legacy” is partly a matter of perspective, rather like freedom fighters and terrorists. If you are SAP then Peoplesoft is legacy, but of you are Oracle then it doesn’t look that way.

When looking at industry trends always ask yourself “who is going to make money here?”. Well, middleware vendors might, which is why IBM is so keen on the idea. As always, hardware vendors win from anything new and complex, and all that extra network traffic will benefit a further set of vendors. Of course the systems integrators will have a field day actually building those composite apps. So in summary, lots of camps will make money, so expect the hype to continue apace. Whether customers will see much real benefit is another matter.

Siebel aquisition may give Oracle indigestion

The applications industry saw further consolidation this week with Oracle’s purchase of Siebel . This is a logical step for Oracle, who need to bulk up in order to feed their struggle with SAP in the applications war, though after Peoplesoft (and hence JD Edwards), Siebel may yet cause some indigestion. It was well known on the industry that Siebel has struggled of late after its meteoric growth in the boom years. Clauia Imhoff amusingly refers to this acquisition as “donkeys can fly” on her blog. I don’t think that she intended that as praise. Siebel’s revenues shrank last year and there has been an exodus of management.

While the idea of customer relationship management is a noble one, Siebel was partly a victim of its own aggressive marketing hype, which promised much more than it delivered. Firstly, given the broad landscape of deployed applications in large companies, it was unrealistic to expect one application to “own” the customer. Worse, Siebel was long on marketing slides and short on well engineered code. A friend working at a bank who spent two years implementing Siebel described it as a “multi million dollar compiler”, since almost every function that they required was not in the core product, and required lengthy and expensive coding from Siebel consultants.

Another friend working at a giant corporation reckoned that their Siebel rollout cost more than their SAP rollout, and that was not meant as a compliment to SAP. Large companies (though not of course system integrators) had become disillusioned with these massive systems integration projects, while Salesforce.com showed just how much was possible in a relatively simple, hosted environment. When the giant deals dried up after the internet bubble collapsed, Siebel was ill suited to adapt to the more difficult sales climate, and a host of executive changes at the top were just the tip of the iceberg, with an exodus in the last year or so of mid-level staff. However, just as breeding two elephants rarely produces a gazelle, the disappearance of Siebel into Oracle’s maw does not solve the issue of customer integration in large companies. The definition of “customer” is still spread amongst every application that needs to reference it, which includes sales force systems but also billing systems, marketing systems and support systems. In time there will be one less definition around as Siebel is absorbed, but this barely scratches the surface of the problem of reducing the complexity of dealing with multiple definitions of master data such as “customer”, as there will still be dozens of sources of this data around. As discussed elsewhere , this requires tools that are not built on the assumption that they are the one and only source of the truth.