Data Quality is so retro darling

Mark Smith (CEO of research company Ventana) writes about data governance as an important area for companies. He rightly points out that technology solutions are only part of the answer, and that organizational issues are at least as important. Strikingly, just 26% of large companies include master data management in their data governance initiatives. Perhaps some of this gap is in terminology, but this is somewhat disturbing since it raises the question: exactly what numbers are most companies using to make their decisions?

From my recollection of working in two of the largest companies in the world, it was best not to dig too deeply into the management information figures much of the time; data quality was a perennial problem, as was the endless “my figures are different to your figures” discussions. As Mark Smith points out, a lot of the issue is getting the ownership of data firmly with the business. Shell carried out an excellent initiative in the late 1990s in defining a common business model (at least down to a certain level of detail) and getting some business ownership of this business model, but even after this it was still a major challenge. It is certain that other large companies have the same issues. What is clear is that data quality and ownership of definitions cannot be an IT issue; it is critical that the business people step up and take control over their data, since they are the ones best placed to understand inconsistencies and problems that someone who has an IT background may overlook.

A good thing about the emerging interest in master data management is that it highlights previously neglected issues in the “data quality” field, that was previously a tough sell internally. Hands up all those volunteers for a data quality project? It was never what you might term a fashionable subject. Yet a lot of issues in MDM are actually related to data quality, so perhaps now that MDM is trendy we can dust off some of those old data quality books and make some better progress than occurred in the 1990s.

To host or not to host, that is the question

The rise and rise of has triggered a shift in licensing amongst a number of companies, both startups like Rightnow and “me too” defensive offerings from the likes of Oracle and SAP. Siebel missed the boat entirely here, though its problems were by no means confined to its licensing model. The fact that it was massively over-marketed, with surprisingly limited core functionality given its price tag, requiring vast consulting resources to tailor every individual implementation, may have contributed also. A friend who had spent two years implementing Siebel at a bank described it as a “million dollar compiler”, since everything that he wanted to do with it required customized programming/consulting.

Leaving Siebel aside, what are the broader implications of hosting and renting software. There are issues both ways. Clearly from a customer viewpoint they don;t have the hassle of installing and doing technical upgrades in-house, so no complaints there. However it is not obvious that a hosted piece of software is any more likely to meet your needs, or to require less tailoring, that one that is installed on-site. There may be (perhaps justified) concerns about the security of your own data, and indeed on related intellectual property. From personal experience I recall needing to back out from a hosted service, and having great difficulty in getting the vendor to export my data into a form that I could easy load into a transportable form – er, hello, this was my data after all.

From the vendor viewpoint there are also pros and cons. If you start from scratch then things are easier. Hosting services are much cheaper to provide than many customers realize, so margins can be very attractive. Because you are in control of the implementation, you have less issues at particular customers, who have installed some wacko combination of middleware (usually the day before some critical business deadline) that means you can’t replicate their problems. On the other hand, it is hard to grow as fast. Recurring revenue is great, but there is a danger that it can become “the software maintenance without the license” unless you are one of the vendors who have managed massive momentum (like This is also a big problem for existing vendors, whose business models and sales force are geared towards license revenue. Also, as a software vendor you may not want to be in the data centre business.

What is the scale of the hosted software market? In a Red Herring article the hosted software market is reckoned to be 1.5% of the total USD 72 billion global software market in 2004 (according to IDC). This may to be a big dent overall, but it is still a fair chunk of software, and one that is growing rapidly (doubling by 2009 is IDC’s estimate). Cynics would argue that the whole “ASP” model (remember that) was going to change the world in the 1990s, but the demise of Exodus and other high-profile companies domonstrated the limits of how prepared customers were prepared to go in shifting their data Centrex to a 3rd party.

My instinct is that this is a very real trend, and that the conservatism of corporate buyers may be the major inhibitor at present. Certainly the headaches that big companies have in installing new versions of software is huge – many companies have “standard desktops” that are unpopular with end users and result in an army of people ensuring that everything works (usually on some ancient version of MS Office), so removing this problem has undeniable, and large benefits. For this reason alone, I think many companies will get over their queasiness at having their data stored somewhere off-site, and so this is a trend with real legs.

Ironically this may also prove right those old mainframe die-hards in the 1980s who felt that client/server was lunacy, and that distributing applications around hundreds or thousands of desktops was going to cause far more trouble than it was worth. Certainly more trouble than it ever caused on the good old mainframe, where you knew exactly what version of the packages everyone was using. It is perhaps no accident that IBM is at the forefront of this “on demand” trend, as the desire to centralism is in IBM’s corporate bones.

“Near-shoring” continues apace

For European companies considering outsourcing their IT, there is a nearby alternative to India: Eastern Europe. In 2005 This market was worth 149 million Euros in Hungary, 132 million in the Czech Republic and 201 million in Poland, according to a survey by PAC. In Hungary’s case this represents 16% growth, 11% for the Czech Republic and 9% for Poland. Recent high profile examples have been DHL’s move to Prague and Exxon’s setting up shop in Warsaw and Budapest.

There is much logic to this. The eastern European countries had a fine tradition of education e.g. in standardized tests, Hungary’s student maths scores are higher than the US and the UK. Budapest IT salaries are around one third of London, and although they are rising it can be seen that they will take many years to get anywhere near Western European levels. Nonetheless, this is still something of an area for pioneers. In terms of maturity, in my own experience the Czech Republic was comfortably the best established, followed by Hungary, with Poland lagging. In Prague and Budapest it is possible to find several companies who have successful operations, and at least a few recruitment agencies etc geared up to service them. This was much tougher in Poland, let alone the “wild east” of Russia or Romania, or even Belarus or Ukraine.

However it remains to be seen whether these countries will grab much market share from India, which has great advantages of scale, many years of success in this area, and a largely English-speaking workforce. Salaries in India are still a fraction of those in Hungary, even in Bangalore (never mind Chennai, Hyderabad etc) so their economic advantage looks secure for years, while the sheer number of large companies that have trodden the path to Bangalore means that, ironically, India may actually be of lower risk than Eastern Europe, at least in terms of being “proven”. The greater travel time and time-zone differences are the main drawback here, but if India can work for US companies with a 11.5 hour time difference, why not for UK companies with a 5.5 hour time difference? (yes, India’s time zones are measured on half-hours, a relic of the English civil service).

Some companies will worry that the economic benefits for the more advanced/”safer” places like Budapest and Prague may be transitory. Ireland used to be a popular cheap location, but years of EU-fuelled growth at rates of 8% have now brought Dublin close to the UK in terms of IT salaries. This might indicate that, given the setup costs and risks, it is better to go the whole way and go to India, where the wage differences are so vast that it will take decades to get to Western levels, even at the current high salary growth. One interesting recent step was the Indian firm Satyam opening an office in Hungary. The idea is that customers who want to try off-shoring but are nervous can start in Hungary, and then move to India as they grow more confident. Eastern Europe also has one advantage over India – continental language skills, which would be important for markets such as France and Germany.

Certainly the “India effect” is having a structural effect on IT consultancy prices. Even Accenture has been forced to cut daily rates, and indeed Accenture’s consulting revenues are actually in decline, but their overall figures are holding up to a rapid rise in outsourcing deals that is making up for the loss in consulting revenues.

Of course there have been some well-publicized problems, such as Dell’s abortive Indian help-desk, which I can testify from personal experience had big problems, but there seem to have been more successes than failures. After all, it is not as if IT projects in the US or UK all go swimmingly well.

Just as manufacturing has, to a large extent, moved to China, it may be inevitable that more and more IT jobs head off to India, and to a lesser extent Eastern Europe. The economics are compelling, and as more and more companies make it work, it feels less like a pioneering activity for the mainstream, which will fuel further growth.

A data warehouse is not just for Christmas

A brief article by Bill Inmon addresses a key point often overlooked – when is a data warehouse finished? The answer is never, since the warehouse must be constantly updated to reflect changes in the business e.g. reorganizations, new product lines, acquisitions etc.

Yet this is a problem because today’s main data warehouse design approaches result in extremely high maintenance costs – 72% of build costs according to TDWI. If a data warehouse costs USD 3M to build and USD 2.1M to maintain annually then over five years you are looking at costs well over USD 11m (let’s generous allow a year to build plus four years of maintenance) i.e. many times the original project cost. These levels of cost are what the industry has got used
to, but these are very high compared to maintenance costs for OLTP systems, which ttypically run at 15% of build costs annually. This high cost level, and the delays in responding to business change when the warehouse schema needs to be updated, contribute to poor perception of data warehouses in the business community, and high perceived failure rates. As noted elsewhere, data warehouses built on generic design principles are far more robust to business change, and have levels of maintenance around 15%.

If the data warehouse industry (and the business intelligence industry which feeds on it) is to continue to grow then it needs to grow up also, and address the issue of better data warehouse design paradigms. 72% annual maintenance costs are not acceptable.

Desperate Data Warehouses

A Gartner Group report mentions that at least 50% of data warehouse projects fail. Of course on its own this sounds bad, but just how bad is it, and what is meant by failure e.g. is being one month late failure, or does it mean complete failure to deliver? How do IT projects in general do? Standish Group run a fairly exacting survey which in 2003 covered 13,522 IT projects, a very large sample indeed. Of these just 34% were an “unqualified success”. Complete failure to deliver were just 15%. The rest are in the middle i.e. they delivered but were not perceived to be complete successes in some way. To be precise: 51% had “cost overruns, time overruns, and projects not delivered with the right functionality to support the business”. Unfortunately the Gartner note does not define “failure” as precisely as Standish; they define the “over 50% as being “limited acceptance or be outright failures”. It is also unclear whether the Gartner figure was a prediction based on hard data, or the opinion of one or more of their analysts.

The Standish study usefully splits the success rate by project size, with a miserable 2% of projects larger than USD 10M in size being complete successes, with 46% of projects below USD 750k being complete successes, 32% up to USD 3M and, 23% at USD 3-6M and 11% at USD 6-10M. The average data warehouse project is somewhere around the USD 2-5M range, with USD 3M often quoted, so indeed on this basis it would seem we should only expect around 25% or so to be “unqualified successes”. Unfortunately I don’t have data available for the failure rate split by size, which presumably may follow a similar pattern, and the rather loose definition that Gartner use makes it hard to compare like with like.

Even if turns out that data warehouse projects aren’t any (or at least much) worse than other IT projects, this is not a great advert for the IT industry. The Standish data most certainly gives a clear message that if you can possible reduce the scope of a project to smaller, bite-sized projects, then you greatly enhance your chance of success. It has long been known that IT productivity drops as projects get larger. This is due to human nature – the more people you have to work with, the more communication is needed, the more complex things become, and the more chance of things being misunderstood or overlooked.

It is interesting that even very large data warehouse projects can be effectively managed in bite-sized chunks, at least if you use a federated approach rather than trying to stuff the entire enterprise’s data into a single warehouse. Projects at BP, Unilever, Philips, Shell and others have taken a country by country approach, or a business line by business line approach, with individual warehouses feeding up to either regional ones or a global one, or indeed both. In this case each project becomes a fairly modest implementation, but there my be many of them. The Shell OP MIS project involved 66 separate country implementations, three regional and one global. Overall a USD 50M project, but broken down into lots of manageable, repeatable pieces.

So, if you data warehouse project is not to become desperate, think carefully about a federated architecture rather than big bang. This may not always be possible, but you will have a greater chance of success.

MDM Trends

In DM Review we see some “2006 predictions”, something that journalists cannot resist doing each January, whatever the subject. In this case the article seems curiously limited to commonest about “customer”. Certainly customer is an important example of master data, and indeed there are several products out there that specialize in this (so-called CDI products, like DWL, recently bought by IBM). However it is a common misapprehension that MDM is just about “customer” and “product”. It is not. One of our customers, BP, uses KALIDO MDM to manage 350 different types of master data, of which just two are customer and product. Large companies also have to worry about the definitions of things like “price”, “brand”, “asset”, “person”, “organization”, “delivery point”, etc, and probably don;t want to buy one software product for each one.

MDM, as an emerging area, is particularly tricky to make predictions about. For what it is worth, I predict that in 2006:

1. There will be several more acquisitions in the space, as large vendors decide that they need to have an offering of some kind, if only to fend off competitive criticism or gaps on RFI checklists. However, caveat emptor here. The better products, like Trigo, have already been snapped up.
2. At least one analyst firm will publish some form of “MDM architecture” diagram that will attempt to classify MDM into different areas, in order to try and elevate that firm’s perceived “thought leadership” on the issue.
3. There will be the first “MDM project disaster” headlines as early adopters begin to move from Powerpoint into more real project implementations. Inevitably, some will not go according to plan.
4. SAP MDME will prove as problematic as the original SAP MDM, which is down pushing up the daisies in a software cemetery near Walldorf. A2i was a poor choice as a platform for a general purpose MDM tool that SAP needs, and this realization will start to sink in when customers start to try it out.
5. Management consultancies, who until mid 2005 could not even spell master data management, will establish consulting practices offering slick Powerpoint slides and well-groomed bright young graduates to deliver “program management” around MDM, with impressive looking methodologies that they are so hot off the presses that the ink is barely dry.
They will purport to navigate a clear path through the maze of MDM technologies and will certainly not, in any way, be learning on the job at the client’s expense.

The next generation data warehouse has a name

Bill Inmon, the “father of the data warehouse” has come up with a new definition of what he believes a next generation data warehouse architecture should follow. Labeled “DW 2.0” (and trademarked by Bill) the salient points, as noted in an article in DM Review are:

– the lifecycle of data
– unstructured data as well as structured data
– local and global metadata i.e. master data management
– integrity of integrated data.

These seem eminently sensible points to me, and ones that indeed are often overlooked in first generation custom-build warehouses. Too often these projects concentrated on the initial implementation at the expense of considering the impact of business change, with the consequence that the average data warehouse costs 72% of implementation costs to support every year e.g. a USD 3M warehouse would cost over USD 2M to support; not a pretty figure. This is a critical point that seems remarkably rarely discussed. A data warehouse that is designed on generic principles will reduce this figure to around 15%.

The very real issue of having to deal with local and global metadata including master data management is another critical aspect that has only recently come to the attention of most analysts and media. Managing this i.e. the process of dealing with master data, is a primary feature of large-scale data warehouse implementations yet the industry has barely woken up to this fact. Perhaps the only thing I would differ with Bill on here is his rather narrow definition of master data. He classifies it as a subset of business metadata, which is fair enough, but I would argue that it is actually the “business vocabulary” or context of business transactions, whereas he has a separate “context” category. Anyway, this is perhaps splitting hairs. At least it gets attention in DW 2.0, and hopefully he will expand further on it as DW 2.0 gets more attention.

The integrity of “integrated” data addresses the difference between truly integrated data that can be accessed in a repeatable way, and the “interactive” data that needs to be accessed in real-time e.g. “What is the credit rating of customer x” that will not be the same from one minute to the next. Making this distinction is a useful one, as it has caused much confusion whereby EII vendors have claimed that their way is the true path, when it patently cannot be in isolation.

I am pleased that DW 2.0 also points out the importance of time-variance. This is something that is often disregarded in data warehouse designs, mainly because it is hard. Bill Inmon’s rival Ralph Kimball calls it the “slowly changing dimension” problem, with some technical mechanisms for how to deal with it, but at an enterprise level, these lessons are often lost. Time variance or “effective dating” (no, this is not like speed dating) is indeed critical in many business applications, and indeed is a key feature of Kalido.

It would indeed be nice if unstructured data mapped neatly into structured data, but here we are rather at the mercy of the database technologies. In principle Oracle and other databases can store images as “blobs” (binary large objects) but in practice very few people really do this, due to the difficulty in accessing them and the inefficiency of storage. Storing XML directly in the DBMS can be done, but brings its own issues, as we can testify at Kalido. Hence I think that the worlds of structured and unstructured data will remain rather separate for the foreseeable future.

The DW 2.0 material also has an excellent section on “the global data warehouse” where he lays out the issues and approaches to tackling deploying a warehouse on a global scale. This is what I term “federation”, and examples of this kind of deployment can be found at Unilever, BP and Shell, amongst others. Again this is a topic that seems to have entirely eluded most analysts, and yet is key to getting a truly global view of the corporation.

Overall it is good to see Bill taking a view and recognizing that data warehouse language and architecture badly needs an update from the 1990s and before. Many serious issues are not well addressed by current data warehouse approaches, and I welcome this overdue airing of the issues. His initiative is quite ambitious, and presumably he is aiming for the same kind of impact on data warehouse architecture as Ted Codd’s rules had on relational database theory (the latter’ “rules” of relational were based on some mathematical theory and were quite rigorous in definition). It is to be hoped that acertificationtoin” process for particular designs or products that Bill develops will be an objective process rather than one based on sponsorship.

More detail on DW 2.0 can be found on Bill’s web site.

Well, there’s a surprise

A research piece shows some facts that will not stun anyone who has had the joy of living through an ERP implementation. According to a new study:

  • one third of users leave large portions of ERP software entirely unused
  • just 5% of customers are using ERP software to its full extent
  • only 12% install ERP “out of the box”
  • over half did not measure return on investment of their IT applications.

The only thing surprising about these figures is how implausibly good they are. According to Aberdeen group, only 5% of companies regularly carry out post-implementation reviews, so “less than half” seems wildly optimistic there. Moreover, just who are these 12% of companies who install ERP “out of the box” with no modification? Not too many in the real world, I suspect. Similarly, very few companies implement every module of an ERP suite, so the figures on breadth of usage seem also unremarkable.

Many ERP implementations were banged in to try and avert the Y2k catastrophe that never happened, but there were plenty before that, and plenty since, including numerous ERP consolidation projects (though there are fewer of these that ever look like finishing). I guess the scary thing here is the expectation gap between the people who actually paid the bill for these mega-projects, and the reality on the ground. However, as I have written about elsewhere, these projects are just “too big to fail” or at least to be seen to fail, as too many careers are wrapped up in them, so this state of denial seems likely to continue until a new generation of CIOs comes along.

Easier than quantum mechanics

I laughed out loud when I saw an article today with the headline “Oracle Solution- Easier to Implement than SAP”, but that isn’t setting the bar real high, is it? SAP may be lots of things: successful, profitable, large, but no one ever accused their software of being simple and easy to implement. What next? “Accountants less creative than Arthur Anderson at Enron” or “now, a car more stylish than a Lada”?

This particular piece of marketing spin is supposedly around an “independent” study done on SAP BW and Oracle Warehouse Builder implementations at various sampled customers. I have to say I suspect that the study might just be paid for by Oracle, though that is not stated, given that this same market research firm also brought you articles such as “Oracle is 46% more productive than DB2”. We all await with bated breath further independent research pieces showing that “Oracle solves world hunger” and “Why the world loves Larry”.

However, in this case I don’t doubt the veracity of the material (much). SAP has become a byword for complexity, with up to 45,000 tables per implementation. Business warehouse is not quite on this scale, but still involves lots of juicy consulting hours and most likely some programming in ASP’s own proprietary coding language ABAP, which I am proud to see that I once took a course in (think: a cross between IBM assembler and COBOL). I haven’t got direct coding experience with Oracle’s tools, but I have to assume that they can’t get murkier than this.

High tech marketing has come up with some entertaining headlines and slogans over the years, but “easier than SAP” is definitely my favorite in 2006 so far.

Application vendors and SOA

In an article looking forward to trends in 2006, an Oracle executive raises an interesting point. The applications market for large enterprises has now essentially reduced to a field of two giants, SAP and Oracle, with a long gap now in size between them and vendors in particular niches such as supply chain or customer relationship management. Yet CIOs are demanding, as he puts it, “hot pluggable” applications, which is another way of saying easy inter-operability between applications. For example a company might like to be able to call up a specialist pricing application from a small vendor within their SAP or Oracle ERP application.

This creates a tricky dynamic for Oracle and SAP, who ideally would like to expand their own footprint within customers at the expense of each other (and other vendors). If SOA actually works, then they will be enabling customers to easily switch out the bits of their applications that customers dislike in favor of others, which is not in their interest. Of course Oracle also sells a middleware stack, and now SAP has entered the fray with Netweaver. By doing so they hope to switch the ground: if someone is going to call up a non-SAP application from within SAP, then SAP would rather that they did it using Netweaver protocols than a rival stack, such as IBM Websphere. Indeed they would really prefer that people didn’t do this at all, but instead just use more and more SAP modules. The same goes for Oracle. Hence these two application vendors need to be seen to be playing the game with regards to inter-operability, yet it is actually more in their own interest if this capability does not work properly. IBM, who does not sell applications, is in a much cleaner position here, since they can only benefit by having genuine application inter-operability via Websphere, whoever the application vendors are. IBM does not sell applications, so only has a vested interest in selling more middleware in this context (and of course the consulting to implement it).

Customers need to be very aware of the desire by the application vendors to lock them into their offerings through their middleware, and should question how genuine the commitment of application vendors to true inter-operability really is. Just as turkeys don’t vote for Christmas, why would a dominant application vendor really want their application to be split into bite-sized pieces that could be each attacked by niche application vendors that would not have the reach to challenge their monolithic applications without this capability?

Of course Oracle and SAP cannot actually say this out loud. IBM (and other independent vendors like Tibco), however, should, and potentially this ought to give them an edge in the coming middleware wars.