The data warehouse carousel

Rick Sherman wrote a thoughtful article which highlighted a frustration amongst people working in the business intelligence field. He says that “Many corporations are littered with multiple overlapping DWs, data marts, operational data stores and cubes that were built from the ground up to get it right this time” – of course each one never quite achieving this nirvana. This never ending cycle of replacement occurs because data warehouses build around conventional design paradigms fundamentally struggle to deal with business change. Unlike a transaction system, where the business requirements are usually fairly stable (core business processes do not change frequently) and where the system is usually aimed at one particular part of the business, a data warehouse gathers data from many different sources and its requirements are subject to the whims of management, who change their mind frequently about what they want. Any major business change in one of the source systems will affect the warehouse, and although each source system change may not happen very often, if you have ten or fifty sources, then change becomes a constant battle for the warehouse. One customer of ours had a data warehouse that USD 4 million to build (a bit larger than the average of USD 3M according to Gartner) and was a conventional star schema, built by very capable staff. Yet they found that the system was costing USD 3.7 million per year in maintenance, almost as much as it cost to build. They found that 80% of this cost was associated with major changes in the (many) source systems that impacted the schema of the warehouse. It is hard to get reliable numbers for data warehouse maintenance, but systems integrators tell me that support costs being as high as build costs is quite normal for an actively used data warehouse (ones with very low maintenance costs tend to get out of sync with the sources and lose credibility with the customers, eventually dying off).

This problem is due to the conventional way in which data models are put together and implemented at the physical level, whereby the models are frequently aimed at dealing with the business as it is today, with less thought to how it might change. For example you might model an entity “supplier” and an entity “customer”, yet one day one of these suppliers becomes a customer. This is a trivial example, but there are many, many traps like this in data models that are then hard-coded into physical schemas. This fundamental issue is what led to the development of “generic modeling” at Shell in the 1990s, which itself was contributed to the ISO process and became ISO 15926. This is very well explained in a paper by Bruce Ottmann, the co-inventor of generic modeling, and is the approach used in the implementation of the KALIDO technology (and a few other vendors). The more robust approach to change that this advanced technique allows means a huge difference to ongoing maintenance costs. Instead of a warehouse costing as much to maintain as to build, the maintenance costs reduce to around 15% of implementation costs, which is much more acceptable. Moreover the time to respond to changes in the business improved dramatically, which may be even more important than the cost.

Whatever the technology used to actually implement, it would be well worth your while understanding the concepts of the generic approach, which leads to the creation of more robust and higher quality data models.