Andy on Enterprise Software

The next generation data warehouse has a name

January 16, 2006

Bill Inmon, the “father of the data warehouse” has come up with a new definition of what he believes a next generation data warehouse architecture should follow. Labeled “DW 2.0” (and trademarked by Bill) the salient points, as noted in an article in DM Review are:

– the lifecycle of data
– unstructured data as well as structured data
– local and global metadata i.e. master data management
– integrity of integrated data.

These seem eminently sensible points to me, and ones that indeed are often overlooked in first generation custom-build warehouses. Too often these projects concentrated on the initial implementation at the expense of considering the impact of business change, with the consequence that the average data warehouse costs 72% of implementation costs to support every year e.g. a USD 3M warehouse would cost over USD 2M to support; not a pretty figure. This is a critical point that seems remarkably rarely discussed. A data warehouse that is designed on generic principles will reduce this figure to around 15%.

The very real issue of having to deal with local and global metadata including master data management is another critical aspect that has only recently come to the attention of most analysts and media. Managing this i.e. the process of dealing with master data, is a primary feature of large-scale data warehouse implementations yet the industry has barely woken up to this fact. Perhaps the only thing I would differ with Bill on here is his rather narrow definition of master data. He classifies it as a subset of business metadata, which is fair enough, but I would argue that it is actually the “business vocabulary” or context of business transactions, whereas he has a separate “context” category. Anyway, this is perhaps splitting hairs. At least it gets attention in DW 2.0, and hopefully he will expand further on it as DW 2.0 gets more attention.

The integrity of “integrated” data addresses the difference between truly integrated data that can be accessed in a repeatable way, and the “interactive” data that needs to be accessed in real-time e.g. “What is the credit rating of customer x” that will not be the same from one minute to the next. Making this distinction is a useful one, as it has caused much confusion whereby EII vendors have claimed that their way is the true path, when it patently cannot be in isolation.

I am pleased that DW 2.0 also points out the importance of time-variance. This is something that is often disregarded in data warehouse designs, mainly because it is hard. Bill Inmon’s rival Ralph Kimball calls it the “slowly changing dimension” problem, with some technical mechanisms for how to deal with it, but at an enterprise level, these lessons are often lost. Time variance or “effective dating” (no, this is not like speed dating) is indeed critical in many business applications, and indeed is a key feature of Kalido.

It would indeed be nice if unstructured data mapped neatly into structured data, but here we are rather at the mercy of the database technologies. In principle Oracle and other databases can store images as “blobs” (binary large objects) but in practice very few people really do this, due to the difficulty in accessing them and the inefficiency of storage. Storing XML directly in the DBMS can be done, but brings its own issues, as we can testify at Kalido. Hence I think that the worlds of structured and unstructured data will remain rather separate for the foreseeable future.

The DW 2.0 material also has an excellent section on “the global data warehouse” where he lays out the issues and approaches to tackling deploying a warehouse on a global scale. This is what I term “federation”, and examples of this kind of deployment can be found at Unilever, BP and Shell, amongst others. Again this is a topic that seems to have entirely eluded most analysts, and yet is key to getting a truly global view of the corporation.

Overall it is good to see Bill taking a view and recognizing that data warehouse language and architecture badly needs an update from the 1990s and before. Many serious issues are not well addressed by current data warehouse approaches, and I welcome this overdue airing of the issues. His initiative is quite ambitious, and presumably he is aiming for the same kind of impact on data warehouse architecture as Ted Codd’s rules had on relational database theory (the latter’ “rules” of relational were based on some mathematical theory and were quite rigorous in definition). It is to be hoped that acertificationtoin” process for particular designs or products that Bill develops will be an objective process rather than one based on sponsorship.

More detail on DW 2.0 can be found on Bill’s web site.

Well, there’s a surprise

A research piece shows some facts that will not stun anyone who has had the joy of living through an ERP implementation. According to a new study:

  • one third of users leave large portions of ERP software entirely unused
  • just 5% of customers are using ERP software to its full extent
  • only 12% install ERP “out of the box”
  • over half did not measure return on investment of their IT applications.

The only thing surprising about these figures is how implausibly good they are. According to Aberdeen group, only 5% of companies regularly carry out post-implementation reviews, so “less than half” seems wildly optimistic there. Moreover, just who are these 12% of companies who install ERP “out of the box” with no modification? Not too many in the real world, I suspect. Similarly, very few companies implement every module of an ERP suite, so the figures on breadth of usage seem also unremarkable.

Many ERP implementations were banged in to try and avert the Y2k catastrophe that never happened, but there were plenty before that, and plenty since, including numerous ERP consolidation projects (though there are fewer of these that ever look like finishing). I guess the scary thing here is the expectation gap between the people who actually paid the bill for these mega-projects, and the reality on the ground. However, as I have written about elsewhere, these projects are just “too big to fail” or at least to be seen to fail, as too many careers are wrapped up in them, so this state of denial seems likely to continue until a new generation of CIOs comes along.

Easier than quantum mechanics

I laughed out loud when I saw an article today with the headline “Oracle Solution- Easier to Implement than SAP”, but that isn’t setting the bar real high, is it? SAP may be lots of things: successful, profitable, large, but no one ever accused their software of being simple and easy to implement. What next? “Accountants less creative than Arthur Anderson at Enron” or “now, a car more stylish than a Lada”?

This particular piece of marketing spin is supposedly around an “independent” study done on SAP BW and Oracle Warehouse Builder implementations at various sampled customers. I have to say I suspect that the study might just be paid for by Oracle, though that is not stated, given that this same market research firm also brought you articles such as “Oracle is 46% more productive than DB2”. We all await with bated breath further independent research pieces showing that “Oracle solves world hunger” and “Why the world loves Larry”.

However, in this case I don’t doubt the veracity of the material (much). SAP has become a byword for complexity, with up to 45,000 tables per implementation. Business warehouse is not quite on this scale, but still involves lots of juicy consulting hours and most likely some programming in ASP’s own proprietary coding language ABAP, which I am proud to see that I once took a course in (think: a cross between IBM assembler and COBOL). I haven’t got direct coding experience with Oracle’s tools, but I have to assume that they can’t get murkier than this.

High tech marketing has come up with some entertaining headlines and slogans over the years, but “easier than SAP” is definitely my favorite in 2006 so far.