Flexibility need not imply anarchy

An article by Rick Sherman ponders how data marts are finally being supplanted by enterprise data warehouses, at least according to a new TDWI survey. Yet he muddies the waters by saying that it is quicker to produce an EDW with no data marts than one with data marts, and so suggesting that sometimes, maybe, data marts are still the way to go.

Certainly a central enterprise warehouse without data marts or a decent way of producing them is inevitably going to do nothing to reduce the plethora of departmental data marts, since people will certainly want to get data that is relevant to them, and they’ll do it one way or another if the IT department has too big a backlog. But surely this misses the point. Isolated data marts are a major problem – they can never allow the “single view” which many executives need to take across divisional or departmental boundaries, as without something central there are just too many combinations to ever allow consistency. However it should not be an either/or situation. A modern enterprise data warehouse should be perfectly capable of producing useful data marts on an as-needed basis. The data warehouse cannot afford to sit in glorious isolation or it will fall into disuse, and you will end up full circle with numerous disconnected data marts arising to get around its failings.

An important point not made in the article is that in really large organizations you may not be able to get away with just one warehouse. A company with many country operations may want to have a summary global warehouse as well as one per country. Each of these will have dependent data marts. The “local” data warehouses can feed up summary data to the global one. Indeed for truly huge companies this may be the only practical solution. Examples of such an architecture can be found at Shell, BP, Unilever and others. A major advantage of such a federated approach is that political control of the local warehouse remains within the subsidiaries, which avoids “them v us” issues where the local operating units resist something being imposed on them by central office. In this scenario the local subsidiary get a warehouse that suits their needs, and the central office gets its summary information as a side-effect. Such an approach is technically more complicate, but if you use a technology that is capable of being federated (which these do) then the overhead is not great, but you have the key advantage of retaining global v local flexibility.

The supply chain gang

There is a thoughtful article today by Colin Snow of Ventana in Intelligent Enterprise. In it he points out some of the limitations today in trying to analyze a supply chain. At first sight this may seem odd, since there are well established supply chain vendors like Manugistics and I2, as well as the capabilities of the large ERP vendors like SAP and Oracle. However, just as with ERP, there are inherent limitations with the built-in analytic capabilities of the supply chain vendors. They may do a reasonable job of very operational level of reporting (“where is my delivery”) but struggle when it comes to analyzing data at a broader perspective (“what are my fully loaded distribution costs by delivery type”). In particular he hits the nail on the head as to one key barrier: “Reconciling disparate data definitions”. This is a problem even within the supply chain vendors’ software, some of which have grown through acquisition and so do not have a unified technology platform or single data model underneath the marketing veneer. We have one client who uses Kalido just to make sense out of data within I2’s many modules, for example.

More broadly, in order to make sense of data across a complete supply chain you need to reconcile information about suppliers with that in your in-house systems. These will rarely have consistent master data definitions i.e. what is “packed product” in your supply chain system may not be exactly the same as “packed product” in you ERP system, or within your marketing database. The packaged application vendors don’t control every data definition within an enterprise, and the picture worsens if the customer needs to work with external suppliers more closely e.g. some supermarkets have their inventory restocked by their suppliers when stocks go below certain levels. Even if your own master data is in pristine condition, you can be sure that your particular classifications structure is not the same as any of your suppliers. Hence making sense of the high level picture becomes complex since it involves reconciling separate business models. Application vendors assume that their own model is the only one that makes sense, while BI vendors assume that such reconciliation is somehow done for them in a corporate data warehouse. What is needed is an application-neutral data warehouse in which the multiple business models can be reconciled and managed, preferably in a way that allows analysis over time e.g. as business structures change. Only with this robust infrastructure in place will the full value of the information be able to be exploited by the BI tools.

Pure as the driven data

In a recent conference speech, IDC analyst Robert Blumstein had some interesting observations about linking business intelligence applications to corporate profitability. Noting how many business decisions are still made based on spurious, incomplete or entirely absent data, he notes that “It’s easier to shoot from the hip, in many ways”. I found this comment intriguing because it echoes similar ones I have heard before in my corporate career. I remember one of my managers saying that many corporate managers didn’t seek data to support their decisions because they felt that using their “judgment” and “instincts” were mainly what they were being paid for. This syndrome was summarized elegantly by historian James Harvey Robinson, who said: “Most of our so-called reasoning consists in finding arguments for going on believing as we already do.”

I personally believe that there are very, very few managers who are so gifted that their instincts are always right. The world was always a complex place, but it is ever more so now with a greater pace of change in so many ways. Hence I believe that being “data driven” is not only a more rational way of responding to a complex world, but that it will lead to greater success in most cases. As the economist John Maynard Keynes said on being questioned over a change of his opinion: “When the facts change, I change my mind — what do you do, sir?”. I have observed that the most impressive managers I have seen are prepared to modify their decision in the face of compelling new information, even if that contradicts their “experience”, which was often built up many years ago in quite different situations.

Making business decisions is hard, all the more so in large organizationss where there are many moving parts. There are many insights that good quality data can give that contradict “experience”. One customer of ours discovered that some of their transactions were actually unprofitable, which had never come to light since the true costs of manufacturing and distribution were opaque prior to their implementing a modern data warehouse system. All the people involved were experienced, but they were unable to see their way through the data jungle. In another customer, what should have been the most profitable product line in one country was also being sold at a loss through one channel, but again the true gross margin by channel was opaque prior to their new data warehouse system; in this case it was a problem of a poorly designed commissionn plan that was rewarding salesmen on volume rather than profitability. “Data driven” managers will seek to root out such business anomaliess through the analysis of hard data, fact rather than opinion.

It is often noted that data warehouse projects have a high failure rate. Of course there are many reasons for this, such as the difficulty most have in keeping up with businesss change, and the vagaries that beset any IT project. Yet could part of the problem be that, at least in some cases, the people for whom the systems are supposed to provide informationn simply would prefer to wing it?

A big industry, but still a cottage industry

IDC today announced their annual survey results of the size of the data warehousing market. IDC sizes the overall market in 2004 at 8.8 billion. The “access” part of market e.g. Business Objects, Cognos, was USD 3.3 billion,”data warehouse management tools” (which includes databases like Teradata, and data warehouse appliances) was USD 4.5 billion Data warehouse generation software (which includes data quality) was sized at USD 1 billion. This was 12% growth over 2003, the fastest for years, and IDC expect to see compound annual growth of 9% for the next five years.

One feature of this analysis is how small the “data warehouse generation” part of the market is relative to databases and data access tools. It is in some ways curious how much emphasis has been on displaying data in pretty ways (the access market) and the storage mechanism (data warehouse management market) rather than how to actual construct the source of the data that feeds these tools. This is because today that central piece is still in the cottage industry stage of custom-build. Indeed with an overall market size of USD 35 billion (Ovum) it can be seen that the bulk of spending in this large market is still with systems integrators. Only a few products live in the “data warehouse generation” space e.g. SAP BW and Kalido (data quality tools should really be considered a separate sub-market). Hence the bulk of the industry is still locked in a “build” mentality, worrying about religious design wars (Inmon v Kimball) when one would have expected them to move into a “buy” mentality. This inevitably will happen, as it did to financial applications. Twenty or so years ago it was entirely normal to design and build a general ledger system, and who would do that today? As large markets mature, applications will gradually replace custom-build, but it is a slow process, as can be seen from these figures.

The average data warehouse costs USD 3 million to build (according to Gartner) and only a small fraction of this is the cost of software and hardware, the majority being people costs. It also takes 16 months to deliver (a TDWI survey) which is an awful long time for projects which are supposedly delivering critical management information. To take the example of Kalido, the same size project takes less than 6 months instead of 16 months, so for that reason alone people will eventually come around to buying rather than building warehouses. Custom data warehouses also have very high maintenance costs, which is another reason for considering buy rather than build.

The rapid growth in the market should not be surprising. As companies have bedded down their ERP, supply chain and CRM investments it was surely inevitable that they started to pay attention to exploiting the data captured within those core transaction systems. The diversity of those systems means that most large companies today still have great difficulty answering even simple questions (“who is my most profitable customer”, “what is the gross margin on product X in France v Canada”) which causes senior management frustration. Indeed a conversation I had at the CEFI conference this week with a gentleman from McKinsey was revealing. In recent conversation with CEOs he explained that McKinsey were struck by how intensely frustrated CEOs were at the speed of response of their IT departments to business needs, above all in the area of management reporting. 16 month projects will not do any longer, but IT departments have are still stuck in old delivery models that are not satisfying their business customers – the ones who actually pay their salaries.

The fullness of time

Supposedly “timing is everything”, yet analysis across time is a surprisingly neglected topic in many data warehouse implementations. If you are a marketer, it is clear that time is a critical issue: you want to be able to compare seasonal sales patterns, for example. A retailer may even be interested in the pattern of buying at different times of the day, and change stock layout in response to this. Yet in many data warehouse designs, time is an afterthought. For example in SAP BW you can only analyze a field for date/time reporting if you specify this up-front at the time of implementation, and this carries a performance penalty. Even this is an improvement on many custom-build warehouses, where data is not routinely date-stamped and so even basic reporting using time is impractical.

Advanced data warehouse technology should enable you to not only do simple time-based analysis like “last summer’s sales v this summer’s sales” but also be able to keep track of past business hierarchies. For example you may want to see the sales profitability before and after a reorganization, and so want to look at a whole year’s data as if the reorg never happened, or as if it had always happened. One major UK retailer has a whole team of staff who take historic data and manually edit a copy of that data in order to be able to make such like-for-like comparisons, and yet this type of analysis should be something that their data warehouse can automatically provide. An example of doing it right is Labatt where the marketing team now had access to a full-range of time-based analysis, enabling to take more data-driven decisions.

Another sophisticated user of time-based analysis is Intelsat, who used sophisticated time-based analysis to improve their understanding of future satellite capacity. Satellite time is sold in blocks, usually in recurring contracts to news agencies such as CNN or the BBC e.g. “two hours every Friday at 16:00 GMT”. Each of these contracts has a probability of being renewed, and of course there are also prospective contracts that salesmen are trying to land but may or may not be inked. Hence working out the amount of satellite inventory actually available next Tuesday is a non-trivial task, involving analysis that was previously so awkward that it was only done on occasion. After implemented a data warehouse that inherently understands time-variance, Intelsat were able to identify no less than USD 150 million of additional capacity, and immediately sell USD 3 million of this, a handsome return on investment on a project that was live in just three months and cost less in total than even the immediate savings.

If your data warehouse can’t automatically give you sophisticated time-based analysis then you should look at best-practice cases like this. Make time to do it.

The data warehouse carousel

Rick Sherman wrote a thoughtful article which highlighted a frustration amongst people working in the business intelligence field. He says that “Many corporations are littered with multiple overlapping DWs, data marts, operational data stores and cubes that were built from the ground up to get it right this time” – of course each one never quite achieving this nirvana. This never ending cycle of replacement occurs because data warehouses build around conventional design paradigms fundamentally struggle to deal with business change. Unlike a transaction system, where the business requirements are usually fairly stable (core business processes do not change frequently) and where the system is usually aimed at one particular part of the business, a data warehouse gathers data from many different sources and its requirements are subject to the whims of management, who change their mind frequently about what they want. Any major business change in one of the source systems will affect the warehouse, and although each source system change may not happen very often, if you have ten or fifty sources, then change becomes a constant battle for the warehouse. One customer of ours had a data warehouse that USD 4 million to build (a bit larger than the average of USD 3M according to Gartner) and was a conventional star schema, built by very capable staff. Yet they found that the system was costing USD 3.7 million per year in maintenance, almost as much as it cost to build. They found that 80% of this cost was associated with major changes in the (many) source systems that impacted the schema of the warehouse. It is hard to get reliable numbers for data warehouse maintenance, but systems integrators tell me that support costs being as high as build costs is quite normal for an actively used data warehouse (ones with very low maintenance costs tend to get out of sync with the sources and lose credibility with the customers, eventually dying off).

This problem is due to the conventional way in which data models are put together and implemented at the physical level, whereby the models are frequently aimed at dealing with the business as it is today, with less thought to how it might change. For example you might model an entity “supplier” and an entity “customer”, yet one day one of these suppliers becomes a customer. This is a trivial example, but there are many, many traps like this in data models that are then hard-coded into physical schemas. This fundamental issue is what led to the development of “generic modeling” at Shell in the 1990s, which itself was contributed to the ISO process and became ISO 15926. This is very well explained in a paper by Bruce Ottmann, the co-inventor of generic modeling, and is the approach used in the implementation of the KALIDO technology (and a few other vendors). The more robust approach to change that this advanced technique allows means a huge difference to ongoing maintenance costs. Instead of a warehouse costing as much to maintain as to build, the maintenance costs reduce to around 15% of implementation costs, which is much more acceptable. Moreover the time to respond to changes in the business improved dramatically, which may be even more important than the cost.

Whatever the technology used to actually implement, it would be well worth your while understanding the concepts of the generic approach, which leads to the creation of more robust and higher quality data models.

Tags

Kalido

Do we really all need Business Intelligence tools?

An article I read in in DM Review today highlights Forrester Research saying that “25 percent and 40 percent of all enterprise users” would eventually use BI software. I’m not quite sure what they are smoking over at Forrester, but this seems to me like another of those lessons in the danger of extrapolating. You know the kind of thing: “if this product growth of x% continues, within ten years everyone on the planet will have an iPod/Skype connection/blog/whatever.” The flaw with such thinking is that there is a natural limit to the number of people that will ever want a particular thing. In the case of enterprise software that number is, I would suggest, much lower than commonly supposed. This is for the simple reason that most people are not paid to do ad hoc analysis of data in their jobs. Sure, some finance and marketing analysts spend their days doing this, but how many powerful BI tools does the average sales rep/bank clerk/shelf stacker really need? I’m thinking none at all, since their job is to sell things or do whatever it is that bank clerks do, not be musing on the performance of their company’s products or its customer segmentation.

In my experience of implementing large data warehouse systems at large corporations, there are remarkably few people who need anything more than a canned report, or just possibly a regular Excel pivot table. A salesman needs to work out his commission, a support manager needs to track the calls coming in that week, but these are for the most part regular events, needing a regular report. In a large data warehouse application that has 1,000 end users of the reports produced from it, the number of people setting up these reports and doing ad hoc analysis may well be just 10 i.e. around 1% of the total. Once you get past management and the people paid to answer management’s questions, there are just not that many people whose job it is to ponder interesting trends, or explore large data sets for a living. For this reason a lot of companies end up procuring a lot more “seats” of BI software than they really need. In one case I am intimately familiar with, even after five years of rolling out a leading BI product, the penetration rate was always much lower than I had expected, and never went as high as 5%, and much of this usage was not for genuine “BI” usage.

Of course this is not what the salesmen of BI vendors want to hear, but it is something that IT and procurement departments should be aware of.

Real Time BI – get real

I permitted myself a wry smile when I first heard the hype about “real time” business intelligence, which is hyped again this week . The vision sounds appealing enough: as soon as someone in Brazil types in a new sales order, the ultra-swish business intelligence system in central office knows and reacts immediately. Those who have worked in large corporations will be entertained by the naivety of this, since most large companies would be grateful just to know who their most profitable global accounts are.

The mismatch between fantasy and reality is driven by two factors. The first is that business rules and structures (general ledgers, product classification, asset hierarchies, etc) are not in fact uniform, but are spread out among many disparate transaction system implementations – one survey found that the average Global 2000 company has 38 different sources of product master data alone. Yes, this is after all that money spent in ERP. Large companies typically have dozens or even hundreds of separate ERP implementations, each with a subtly different set of business structures from the next (plus the few hundred other systems they still have around). The second problem is that the landscape of business structures is itself in constant flux, as groups reorganize, subsidiaries are sold or new companies acquired.

Today’s business intelligence and data warehouse products try to sweep this reality under the carpet, producing tools to convert the source data into a lowest common denominator consistent set that can be loaded into a central data warehouse. This simplification is understandable, but means that local variations are lost, and many types of analyses are not possible. Worse, if the business structures change in the source systems, then the data warehouses and reports built on top of them are undermined, with any changes to the structure of the data warehouse taking typically months to bring about. In these intervening months, what happens to the “real time” business intelligence?

The problem comes down to fundamental truth: databases do not like having their structure changed. Adding data is fine, but something which affects the structure of a database (a major reorganization will usually do the trick) will cause pain. If you doubt this, ask a CFO how long it will take him or her to integrate an acquisition just enough to be able to run the management accounts as one combined entity. For some companies acquisitions are a way of life, with several undertaken a year. Such companies are always chasing their tail in terms of trying to get a complete picture of their business performance. This is not just inconvenient but also costly: one company built a large and well-used conventional data warehouse, costing USD 4 million to build. When they properly accounted for all aspects of maintenance, including business user time (which few companies do) they found it was costing USD 3.7 million per year to maintain. There was nothing wrong with the warehouse design; they were operating in a volatile business environment, with 80% of the maintenance cost caused by dealing with business change.

What is needed, and generally what the industry has failed to deliver, are technology solutions that are comfortable dealing with business change: “smarter” software. Today few IT systems can cope with a change in the structure of the data coming into the system without significant rework. The reason for this is in the heart of the way that databases are designed. They are usually implemented to reflect how the business is structured today, with relatively little regard to how to deal with future, possibly, unpredictable, change. Introductory courses on data modeling show “department” and “employee” with a “one-many” relationship between them i.e. a department can have many employees, but a person can be only in one department (and must be in one department). This is easy to understand and typical of the way data models are built up, yet even this most basic model is flawed. I have myself been in between departments for a time, and at another time was briefly part of two departments simultaneously. Hence the simple model works most of the time, but not all of the time: it is not resilient to exceptional cases, and IT systems built on this model will break and need maintenance to cope when such special cases arise. This is a trivial example, but it underlies the way in which systems, both custom built and packaged, are generally built today. Of course it is hard (and expensive) to cater for future and hence somewhat unknown change, but without greater “software IQ” we will be forever patching our systems and discovering that each package upgrade is a surprisingly costly process. If you are the CFO of a large company, and you know that it takes years to integrate the IT systems of an acquired company, and yet you are making several acquisitions each year, then getting a complete view of the business performance of your corporation requires teams of analysts with Excel spreadsheets, the modern equivalent of slaughtering a goat and gazing at its entrails for hidden meaning.

Some techniques in software are emerging that tackle the problem in a more future-oriented way, but these are the exception today. Unfortunately the vendor community finds it easier to sell appealing dreams than to build software to actually deliver them. “Real-time business intelligence” comes from the same stable as those who brought you the paperless office and executive information systems (remember those?) where the chief executive just touches a screen and the company instantly reacts. Back in reality, where it takes months to reflect a reorganization in the IT systems, and many months more just to upgrade a core ERP system to a new version, “real time” business intelligence remains a pipe dream. As long as people design data models and databases the traditional way, you can forget about true “real-time” business intelligence across an enterprise: the real world gets in the way. It is interesting that the only actual customer quoted in the techworld article, Dr Steve Lerner of Merial, had concluded that weekly data was plenty: “The consensus among the business users was that there was no way they were prepared to make business decisions based on sales other than on a weekly basis”.