Philosophy and data warehouses

Database expert Colin White wrote a prvocative article the other day:

in which he ponders whether a data warehouse is really needed for business intelligence. This is an interesting question; after all, why did we end up with data warehouses in the first place rather than just querying the data at source? (which is surely a simpler idea). There seem to me to be a few reasons:

(a) Technical performance. Early relational databases did not like dealing with mixed workloads of transaction update and queries, as locking strategies caused performance headaches.

(b) Storage issues. Data warehouses typically need several years of data, whereas transaction systems do not, so archiving transaction after a few months has a performance benefit and may allow use of cheaper storage media.

(c) Inconsistent master data between transaction systems (who owns “product” and “customer” and “asset” and the like) means that it is semantically difficult to query systems across departments or subsidiaries. Pulling the data together into a data warehouse and somehow mashing it into a consistent structure fixes this.

(d) You may want to store certain BI related data e.g. aggregates or “what-if” information that is useful purely for analysis and is not relevant to transaction systems. A data warehouse may be a good place to do this.

(e) People have trouble finding existing pre-built reports, so having a single place where these live makes re-use easier.

(f) Data quality problems in operational systems mean that a data warehouse is needed to allow data cleansing before analysis.

I think that you can make a case that technology is making some strides to address certain of these areas. In the case of (e), the application of Google and similar search mechanisms (e.g. FAST) to the world of BI may reduce or eliminate problem (e) altogether. Databases have become a lot more tolerant of mixed workloads, addressing problem (a), and storage gets cheaper, attacking problem (b). It doesn’t seem to me that you necessarily have to store what-if type data in a data warehouse, so maybe (d) can be tackled in other ways. Even problem (f), while a long way from being fixed, at least has some potential now that some data quality tools are allowing SOA-style embedding within operational systems, thus holding out the possibility of fixing many data quality issues at source.

If we then take all the master data out of the data warehouse and put in into a master data repository would this not also fix (c)? Well, it might, but regrettably this discipline is still in its infancy, and it seems to me that plucking data out of transaction systems into specific hubs like a “customer hub” or a “product hub” may not be improvoing the situation at all, as indeed Coln acknowledges.

Where I differ from Colin is on his view that a series of data marts combined with a master data store may be the answer. Since data marts are subject specific by definition, they may address a certain subset of needs very well, but cannot address enterprise-wide analysis. This type of analysis can only be done by something with access to potentially all the data in an enterprise, and be capable of resolving master data issues across the source systems. Here a data warehouse in conjunction iwth a master data store makes more sense to me than a series of marts plus a master data store – why perpetuate the issues? I have no problem if the data marts are dependent i.e. generated from the warehouse e.g. for convenience/performance. But as soon as they are maintained outside a controlled environment you come back to problem (c) again.

Sadly, though some of the recent technical improvements point the way to the solution of problems (a) through (f), the reality on the ground is a long way off allowing this. For example the data quality tools could be embedded via SOA into operatonal systems and linked up to a master data store to fix master data qualit issues, but how many companies have done this at all, let alone across more than a pilot system? Master data repositories are typically still stuck in a “hub mentality” that means they are at best, as Colin puts it, “papering over the cracks of master data problems”. Moreover most data warehouses are still poorly designed to deal with historical data and cope with business change.

Hence I can’t see data warehouses going away any time soon. Still, it is a useful exercise to remind ourselves why we built them in the first place. Questioning the meaning of existence is called ontology, which ironically has now been adopted as a term by computer science to mean a data model that represents concepts within a domain and the relationship between those concepts. We seem to have come full circle, a suitable state for the end of the week.
Have a good weekend.

Just Singing The Blues

There was a curious piece of “analysis” that appeared a few days ago in response to IBM’s latest data warehouse announcments:;jsessionid=ZOETRZHF0SWBMQSNDLOSKH0CJUNN2JVN?articleID=198000675

In this gushing piece, Current Analysis analyst James Kobelius says: “IBM with these announcements becomes the premiere data warehousing appliance vendor, in terms of the range of targeted solutions they provide”. So, what were all these new innovative products that appeared?

Well, IBM renamed the hideous “Balanced Configuration Units” (you what now?) to “Balanced Warehouse”, a better name for sure. Also in the renaming frame was “Enterprise Class” being renamed to “E Class” (hope they didn’t spend too many dollars on that one). In fact the only supposedly “new” software at all that is apparent is the OmniFind Analytics Edition. The analysis credits this as a new piece of software, which will come as a surprise to many of us with memories longer than a mayfly e.g. the following announcement of Ominifind 8.3 is on the IBM website dated December 2005:

In fact the whole release seems to be around repackaging and repricing, which is all well and good but hardly transports IBM to some new level it wasn’t at, say, a week ago.

Let’s not forget about “new services” such as “implementation of an IBM data warehouse” – well, that certainly was something that never crossed IBM’s Global Services mind before last week. Now, I’m not a betting man, but I would be prepared to wager a dollar that IBM have a contract with Current Analysis – any takers against?

The excellent blog “The Cranky PM” does a fine job of poking fun at the supposedly objective views of analyst firms that are actually taking thick wads of dollars from the vendors hat they are analysing e.g.

I wonder what she would make of this particular piece of insight?


Just in case you were in any doubt that the BI software industry is going through a relatively healthy phase, you may like to ponder the latest USD 46M purchase by Microstrategy. A niche technology company perhaps? Nope – a new private jet (a Bombardier Global Express XRS to be precise) for CEO Michael Saylor. This was reported recently in, amongst others, the Washington post:

Saylor has always been a charismatic but eccentric character e.g.:

but even so. This is a company that in December 2006 had USD 79M in cash (past tense). Its 2006 revenues were USD 313M (up from around USD 269M in 2005) and profits of nearly USD 71M (unaudited). This makes it an unusually profitable enterprise software company. However just what the board of directors were thinking when approving this particular purchase can only be a matter of speculation. Even though Saylor has a large stake in the company surely this is the kind of thing that boards of directors of public companies are supposed to be for? Perhaps it will encourage some blue sky thinking.

Still, pity their public relations manager. You have to be impressed with the creativity here: the plane “will facilitate more effective communication and more rapid coordination with its global employees, partners, and customer base” according to their SEC filing. Good one; this wins my “brassiest PR” prize of the week.

Deja vu all over again

There is some good old fashioned common sense in an article by John Ladley in DM Review:

where he rightly points out that although most companies are now on their second or third attempt at data warehouses, they seem not to have learnt from the past and hence seem doomed to repeat prior mistakes. Certainly a common thread in my experience is the desire of IT departments to second guess what their customers need and end up making life unnecessarily hard for themselves. If you ask a customer how long he needs access to his detailed data he will say “forever” and if you ask how real time it needs to be of then of course he would love it to be instantaneous on a global basis. What is often not presented is the trade off: “well, you can have all the data kept forever, but the project costs will go up 30% and your reporting performance will be significantly worse than if you can make do with one year of detailed data and summaries prior to this”. In such a case the user might well change his view on how critical the “forever” requirement was.

This disconnect between corporate IT departments and the business continues to be a wide one. I recently did some work for a global company where a strategy session was to decide the IT architecture to support a major MDM initiative. None of the business people had even bothered to invite the internal IT department, such was the low regard in which it was held. Without good engagement between IT and business data warehouse projects will struggle, however up to date the technology used is.

Mr Ladley is also spot on regarding data quality – it is always much worse than people imagine. “Ah, but the source is our new SAP system so the data is clean” is the kind of naive comment that many of us will recognise. At one project at Shell a few years ago it was discovered that 80% of the pack/product combinations being sold in Lubricants were duplicated somewhere else in the Group. At least that could be partially excused by a decentralised organisation. Yet it also turned out that of a commercial customer database of 20,000 records, only 5,000 were truly unique, and this was in one operating company. Imagine the fun with deliveries and invoice payment that could ensue.

Certainly data warehouse projects these days have the advantage of more reliable technology and fuller function products than ten years ago, meaning less custom coding is required than used to be the case. However the basic project management lessons never change.

BI Trends – not a happy satisfaction picture

There is a survey this week about current BI issues and trends:

I always tend to be a little sceptical about such surveys since you are never sure how representative the sample base is, but with that caveat there appear to be a few interesting themes. Broadly spending on BI is expected to go up by 10%, and this is in line with other sureys.

One opportunity for the industry is that just 4% have a BI tool on a subscription basis, but 30% would be interested in one. The success of suggests that if existing vendors hesitate too long then a competitor could get an advantage here (just ask Siebel).

73% of respondents reckon they are reducing the number of BI tools in their shop, with 29% hoping to get down to one. While some of this may be wishful thinking it certainly reflects the recent trend towards consolidation e.g. Hyperion buying Brio, and then Oracle buying Hyperion.

Just 18% of respondents are “evaluating” open source BI tools, presumably with considerably less having actually deployed one. This is a useful wake-up call to the open source zealots. At least in the BI sphere there seems to be little impact so far.

However I found the most revealing data point to be the satisfaction levels with existing solutions. In eight categories (ETL, data warehousing, various packaged apps) the second highest scoring category (ETL) had just 43% of people claiming to be “successful”, and this feeble rate drops to around 30% or so for analytic apps that are packaged with financial, CRM or supply chain systems. 44% of people claimed success with packaged BI reporting tools, but again this is hardly a number to be crowing over if you are a BI vendor: “we are slighty less dismally unsuccessful than packaged analytical apps” isn’t something you want to put on your advertising campaign. Of course, as with all such surveys, there is the question of how the question was phrased i.e. what does “successful” mean, but even so it hardly paints a picture of contented customers.

This level of satisfaction does not strike me as odd. Large companies have mostly not solved their basic management information problems. Many of the issues come back to data issues: poor master data and dismal data quality are rife. One large (and highly successful) company I know has been trawling through its supplier data recently: four out of five supplier records turned out to be duplicates, and just in case you assume their problem is historical, one in three of even of new suppliers set up in the last 12 months turned out to be duplicates also. With such fundamental issues of data classification and quality, you can have all the pretty reporting tools you like and yet you are still unlikely to be solving too many real management information problems. You will get numbers, but they are not good numbers

I am always surprised by how little attention that data quality gets when it clearly undermines the quality of the reports that management rely on. Maybe people just feel comfortable seeing a pretty chart: it is reassuring. Knowing that the data it is based on is junk is perhaps too uncomfortable for a lot of organisations to wish to confront.

Kalido MDM 8.3 Ships

Although I am no longer with Kalido I do like to keep up with events there, and the developer’s party for Kalido MDM 8.3 was last night in London. This release, announced on February 20th, started shipping this week and is a major version. Effectively it is “version 3” for MDM. The MDM vendor landscape is pretty confused, with every rinky dink data quality tool now claiming itself to be an “MDM product”, presumably in the hope that someone might actually buy their product if it has a catchy label. Mostly these claims are Powerpoint-deep only, but with 56 vendors now reckoning they play as “MDM” tools the potential for confusion is high.

One useful way of distinguishing things is to think about whether the tool is primarily about operational data or analytical data. A CDI hub (like Oracle’s) is very much about taking customer data from various sources, rationalising it and spitting it out to other operational systems. Performance and the ability to deal with high volume are important here, as you are in an operational world.

Products like Kalido MDM instead worry about master data for analytical purposes, and indeed can co-exist happily with hubs. Kalido MDM assumes that there is no one single, clean, authoritative source for master data (which is the case in just about every company) and provides workflow capability for assessing the master data sources and allowing business users to view, verify update and improve master data in a controlled way, ending up with “golden copy” master data at the enterprise level. Ambitious customers can link the master data repository that results back to operational systems via EAI tools if they wish, driving master data changes back to the operational systems. Even customers religiously committed to SAP or Oracle will have to deal with master data that is from other sources, whether that be legacy systems or indeed external data from their business partners. This is where Kalido MDM can play an effective role.

In 8 Release 3 there were three major areas of improvement. The workflow capability in Kalido MDM was significantly enhanced, allowing greater customer flexibility in configuring workflow. Indeed this greater flexibility has been taken further with (finally) a complete API to the MDM product, which allows customers to slot in business-specific funcionality if they wish e.g. as a web service. New reporting tables allow customers who don’t have Kalido’s data warehouse product to better query the master data repository.

The second major improvement has been in user interface, which was always something of a weakness in KALIDO MDM. In this release over 100 customer suggestions for user interface improvements have been implemented, and from what I have heard the beta customers have been very pleased with these changes.

Finally the new release has major scale and performance improvements. Of course performance is a slippery thing, but functions such as loading and validating data and the user interface itself have speeded up by an average of two to five times (in some cases more, in some cases less) while the underlying product is capable of realistically handling five times the volume of the previous release.

These enhancements should be welcome news to existing customers like Unilever, BP, GAFRI, Labatt and Nationwide Insurance. It is never easy being a pioneer, and early customers inevitably encounter issues with early software. However from what I can tell this version of MDM really starts to deliver on the early promise (KALIDO MDM has been around since 2003), and should allow Kalido to better tackle the increasing number of MDM competitors who have recently come to the market.

Another kind of call centre

A blog that I have had a lot of reaction to was “The Joy of Call Centres”

The Joy of Call Centres

If you follow the link you will see a trail of similar bad experiences under the comments. Just to present the other side of the story, I will share with you how good a call centre can be when it is well run. Those of you who know me personally will be aware that I am more attached to my Tivo that most people are to their family pets. Tivo’s innovative technology was matched only by the ineptness of their UK marketing, which sadly means that you cannot buy a new Tivo in the UK, only in the US, though the product is still supported via a call centre in Edinburgh. A few days ago the event I have dreading for years came to pass: my Tivo passed away quietly in the night. It wasn’t immediately apparent that this was the case since my little darling still had its green light on, the equivalent of its eyes open staring into space. Anyway, the point of the story was to explain how incredibly un-BT like the Tivo helpdesk was. It actually took a fair bit of diagnosis to confirm that the Tivo hard drive had actually snuffed it, and the help-desk engineer (a gentleman called David Stoke) was extremely patient and helpful as I went through various steps of diagnosis. What was more impressive was what happened when we were sure that the box was dead. Instead of doing what any regular script-following helpdesk operative would do i.e. hang up, he went off to the internet and actually found me a Tivo for sale on eBay conveniently near where I lived and emailed me the link to it, so that I could quickly replace it. This worked out well and within four days I had a working Tivo up and running again. The attitude of David (and the other people I have spoken to on the Tivo help desk) could not be further from that of the procedure-driven “I’m just reading the script” approach that we have sadly become all to used to in the UK from the likes of BT.

It would seem that I am not unique in this view, and a number of UK companies like Abbey National and Powergen have had enough and are pulling their call centres back to the UK:

I don’t think this is anything really to do with the call centre staff being in India rather than, say Scotland or Ireland (though obviously a shared culture and language skills matter to some extent). I am sure that BT would be capable of training ineffective, hostile call centre staff anywhere in the world given half a chance. However the key is that companies need to understand that a call centre is actually the primary (maybe the only) touch point that exists for most customers: the impression that the call centre staff leave reflects directly on the impression that the customer has of the company as a whole. In the case of BT’s shambolic Mumbai helpdesk I was left with the view that BT was, well, like it always used to be. By contrast the Tivo helpdesk left a very positive impression of caring, enthusiastic staf who were passionate about their customers and the company’s products.

All the marketing in the world means less than these individual real experiences, and it does seem absurd to spend a fortune on direct mail and advertising and then blow it all by having a lousy call centre. By contrast, investing in high quality front-line staff that care about customers would seem to me something that has a very real return on investment.

A Titan No More

Consolidation in the BI space continues as Oracle snaps up Hyperion for USD 3.3 billion (Hyperion had USD 486M of cash, so effectively the deal is USD 2.8 billion). Hyperion’s year end is June and to year end 2006 its revenues were 765M (which was 9% up over 2005). Hence this is a fairly healthy valuation of about 3.7 times trailing revenues.

The acquisition of Hyperion’s dominant financial consolidation software and excellent Essbase OLAP technology makese sense for Oracle, whose large and aggressive sales channel will be able to exploit Hyperion’s strong technology. What is less clear is what happens to Brio. Brio never quite made it up into the same league as Business Objects and Cognos, and given that Oracle already has various reporting software of varying quality it is unclear whether Oracle will really exploit Brio or just let it quietly shuffle off to that crowded house in the sky for acquired BI software. Certainly Brio customers need to carefully think about their options.

At least this puts an end to the rumour that Oracle was going to buy Business Objects. Or does it?

Appliances are proving popular

There is a useful overview of the growing appliance market in Computer Business Review:

The appliance market is nothing if not growing, with no fewer than ten appliance vendors now identified by analyst Madan Sheina (who by the way, is one of the smarter analysts out there). Of course apart from Teradata many of these are small or very new. Teradata accounts for about USD 1 billion in total revenue (the accounts will become much clearer once they separate from NCR) though this includes services and support, not just licences. The next largest vendor is Netezza, who does not publish their revenue (though I would estimate over USD 50M). Kognitio used to be around USD 5M in revenue, though they seemed perky when I last spoke to them so may be a little bigger now. DataAllegro will certainly be smaller than Netezza, as will be the other new players. It is too early to say how well HP’s Neoview appliance will do, though clearly HP has plenty of brand and channel clout, especially now that it has acquired Knightsbridge.

Still, so many entrants to a market certainly tell you that plenty of people feel that money can be made. So far Teradata and Netezza have had the field pretty much to themselves, but the entrance of HP and the various newer vendors will create greater competition, which ultimately can only be of benefit to customers.

An unlikely source of BI ideas

I fully agree with an article by Steve Miller:

about how the Harvard Business Review is a surprisingly useful resource for people working in business intelligence. One of the recurring themes I have noticed over the years with projects going wrong is that the root cause of problems is more often people communications than technology. Of course as technologists we are inevitably drawn to the technical issues around the latest technology – performance, how buggy the software is etc, but few pieces of commercial software are so poor that they will cause a project to fail directly due to the software (I exempt Commerce One from this generalisation; it was that bad). The useful thing about Harvard Business review is that it gives some insight into the kind of issues that are confronting senior management, or at least about the kind of issues they are reading about.

However the HBR is rather hard work. There are rarely articles about technology directly (an exception was the November 2006 “Mastering the Three Worlds of Information Technology”) but technology often crops up within other articles, as Steve Miller points out. What I would add is that HBR can be a rather ponderous read. Their articles tend to be long and in-depth rather than bright and breezy, and there is a politically correct element about HR issues which can seem quite sanctimonious. But for every painfully worded article about the joys of diversity training there are several useful ones about current management trends and hot topics.

Speaking the same language as senior management is a stepping stone on the road to better understanding and communication, and that in turn will help improve the propsects of success for a BI project.