Diverse data warehouse approaches

There seem to be a few debates going on about data warehouse architectures at present e.g. one on William McKnight’s blog.  I think that the increasing alternative approaches available is actually a sign of two things.  One is that the problem that data warehousing seeks to address has by no means been solved: people do not have access to the high quality information they need in a consistent of timely fashion.  Secondly, that there is increasing innovation in the area: witness the rise of packaged data warehouses, EII tools and data warehouse appliances in recent years.  It is all a lot more complicated that Inmon v Kimball.

So what have we learnt? Firstly, there are some approaches that just don’t work well.  For a while in the 1990s there was a school of thought that data marts were sufficient without a central warehouse, and this seems to be pretty well debunked.  Just joining up point to point transaction system data via specific data marts results in a potentially vast set of unmanaged data marts, which do nothing to resolve inconsistency between systems at the enterprise level.  Related to this, selling “analytic apps”, which are basically data marts with a specific data model hard wired and some reports on top, does not work either.  The data model always needs modification to the specifics of the customer, and as soon as you do that you no longer have a package but a series of custom-built (or at least custom modified) marts again.  Informatica found this out the hard way until sensibly withdrawing from this flawed approach.

I think it is also clear that EII only has a limited place in a BI architecture.  The pioneer here, Metamatrix, had flat (and modest) sales last year and, moreover, half its customers use it only against one data source: hardly a wild success.  EII does not address issues of data quality or storing data historically, so at best can be only a partial solution.

Within the data warehouse approach I feel that it is important to understand the different types of usage patterns.  In particular, some types of reporting are very operational in nature, and are best served either by reports directly against an individual transaction system (here EII may have a role) or via an operational data store, essentially a straight copy of data into a separate database.  An ODS avoids queries interfering with operational processing, one of the issues with EII.  ERP vendors have started to provide ODS solutions e.g. SAP BW, but don’t confuse these with a full-function enterprise warehouse. These do well in ODS roles, less well when dealing with a wide set of data sources.  The narrower the scope of the report you need, the better suited it is to an ODS (or EII).  The broader the scope (or if it needs historical data), the better suited it is to a data warehouse.  Having a series of ODSs feeding into an enterprise warehouse is a sensible approach. 

However to satisfy reporting needs that span multiple transaction systems, or which deal with historical data, you really need a data warehouse of some sort.  The choice here is widening.  You can now buy packaged, or at least semi-packaged data warehouses from a number of sources.  See the report from Bloor on this market, which you can download in full here.  It has to make sense to buy functionality rather than building it, since it will be quicker and ultimately cheaper.  Data marts can still be part of the picture, but should be dependent i.e. generated from the warehouse.  In this way they stay in line with changes in the source systems.  If you have a very high volume of data, as happens in some industries like retail, Telco and retail banking, then you can now choose from a range of data warehouse appliances, even an open source one, if you don’t fancy Teradata, which was the pioneer and is still the leader in this area.  An alternative to a single giant warehouse is to have federated data warehouses, each feeding up one or more layers to regional or gloabl warehouses.  This approach is offered by Kalido and deployed at companies like Unilever and BP.

Finally, it is becoming clearer that, in parallel with a data warehouse, in order to make the most of it you will want your master data to be as high quality as possible.  A master data repository can act as the hub for improving data quality across the enterprise, and is complementary to the warehouse (indeed, it can be a source for the warehouse, and also to an enterprise bus in more ambitious deployments). The rise of interest in master data management presents a lifeline to data quality vendors, who has been steadily disappearing. Even here there are new approaches in the form of start-ups like Exeros and Zoomix.

Finally, data warehouses can become as real-time as necessary, given sufficient work.  Few BI requirements are truly real-time, but for those that are you can satisfy them either by embedding reporting directly in the transaction system, via EII, an ODS or even by drip-feeding data into an enterprise warehouse.  For example Kalido has an interesting one of these in a financial services setting, where the data appears just ten seconds after changes to the core transaction systems. 

The continuing thirst for better information, and a realisation that few companies have got it right yet, is causing increasing innovation in all these areas: packaged warehouses, appliances, EII, MDM, data quality.  This is a long way from a mature market.  

3 thoughts on “Diverse data warehouse approaches”

  1. Andy – I have found your blog quite interesting and quite informative (especially considering I’m relatively new to the space). I was wondering how it might be possible for you and I to talk briefly concerning the company with which my company has recently merged.

Comments are closed.