The patter of tiny pitfalls

There are some sensible tips from Jane Griffin on MDM pitfalls in a recent article. As she points out, improving your master data is a journey, not a destination, so it makes sense to avoid trying to boil the ocean and instead concentrate on a few high priority areas, perhaps in one or two business units. It would make sense to me to start by identifying areas where MDM problems were causing the most operational difficulties e.g. misplaced orders. By starting where there is a real problem you will have less difficulty in getting business buy-in to the initiative. Be clear that there are lost of different types of master data e.g. we are involved with a project at BP which manages 350 different master data types, and clearly some of these will be more pressing an issue than others.

I have seen some articles where people are struggling to justify an MDM initiative, yet really such initiatives should be much easier to justify than many IT projects. For a start IT people can put the issues in business terms. Master data problems cause very real, practical issues that cost money. For example poor or duplicated customer data can increase failed deliveries, and issues with invoicing. Poor product data can result in duplicated marketing costs, and in some cases even cause issues with health and safety. Problems with chart of accounts data can delay the time needed to close the books. These are all things that have a cost, and so can be assigned a dollar value to fix.

Successful MDM projects will be heavily business-led, driven by the need to improve operational performance. IT staff need to educate business people that there are now an emerging set of solutions that can help, and get those business people involved in owning the data. It is the lack of data governance in many companies that contributed to the poor state of master data in the first place.

The weakest data link

There is a thoughtful article in McKinsey quarterly on managing supply chains. It highlights the problem that even if you have perfectly consistent and accessible information in your company, in many situations e.g. with mobile phone, there is a web of separate companies between the designer and the customer e.g.

components supplier -> distributor -> ODM -> OEM -> distributor -> customer

Each of these is dependent to some extent on the other, and so if you want to know how your sales are going or how is product quality, you will want to interact with information from other companies further back in the chain. This presents the problem that the systems in other companies will not use the same terminology and coding structures as yours, meaning that you will need to resolve these differences in some way e.g. through a data warehouse project. The article points out that in many cases companies have not built these links and so have no visibility up and down the supply chain. This information is not just nice to have:

“Bridging these gaps pays off. In one case, a leading enterprise-computing company started gathering better data from field services, which gave it information on the incidence of failures and their costs. By feeding that data to design teams, the company developed products that could be serviced and repaired more easily. The result: total costs over the product life cycle fell by 10 to 20 percent.”

Clearly such savings are worth having. The article is an excellent illustration that the issues of dealing with multiple semantics are not confined to internal systems, and indeed in such cases standardization is literally unattainable. Instead software solutions are required that can map multiple business structures together and make sense of them. Companies that invest in such data warehouse solutions are, as this article shows, getting very tangible results.

CDI compared to other master data

There is a good article on CDI by Jill Dyche, a co-founder of Baseline Consulting and someone who has clearly seen a lot of real-world CDI projects. She does a good job of explaining how CDI projects have traditionally been quite transaction-oriented, with hubs serving up customer data via middleware to other applications. CDI hubs are at one end of the MDM spectrum, firmly at the “operational” level. At the other end are “analytic” MDM applications, which enable companies to take a cross-enterprise view of key information like assets, people, products, channels etc. Getting to understand the differences between the multiple, conflicting definitions embedded in the source systems is a major job in itself, and will usually result in a master data repository. This in turn can be a feed into a corporate warehouse. A few pioneering companies have taken the final logical step and hooked up their master data repositories, via middleware like Tibco or IBM Websphere, to their operational systems, so that the master data repository becomes the true master source, driving changes as required back down into the operational systems like ERP and CRM.

CDI hubs have started at the other end, linking up to systems providing customer data, often in real-time. Customer data represents a high-value area of MDM, as in the case of consumers the customer data is often quite simple, but is in high volume, and requires fairly simple processing to match a customer record in one system to one in another (e.g. matching “A. Hayler” v “Andy Hayler”). However, this is only part of the answer, as even in the case of “customer” things can get more complex. Suppose you are a company like Shell and you want to treat Unilever as a key global account. Finding out all the information about Unilever is not just a simple keyword matching exercise, since Unilever trades under many different subsidiary names and brands around the world e.g. its main Indian subsidiary is not called Unilever but Hindustan Lever; it also owns a company called Algida, and I defy even the cleverest fuzzy logic algorithm to associate “Algida” with “Unilever” (such examples are why you should always be sceptical about vendors selling matching algorithms) It can be seen that, for more complex situations like this, human intervention is required in order to correctly add up all the element of Unilever’s business.

This issue can become considerably more complex with things like “asset” or “product”, which can have a whole hierarchy of sub-types. This is why CDI hub technology tends to be used specifically for consumer information. Other types of MDM technology are required to manage more complex data and the workflows that surround the updating this e.g. no automated system is going to just create a new brand; this requires numerous approvals and has various knock-on effects to other master data.

I would argue that, at least at present, you are likely to require one kind of technology to handle general purpose MDM data, whether customer or asset or whatever, from an analytical viewpoint, and potentially a separate technology to handle real-time updates, perhaps real-time. Of course it would be nice if a single product did everything, but at present nobody can truly claim this. What does seem a missed opportunity is the way that vendors have made their technology so very specific to particular types of master data e.g. PIM and CDI. While operational and analytic needs are inherently different, there is no reason at all not to take a generic approach to all types of master data. Customers can hardly be expected to buy a separate hub for every type of master data.

Iteration is the key


Ken Pohl writes a thoughtful article on the issues of project management of a data warehouse project, and how this can differ from other IT projects. As he points out, a data warehouse project is unusual in that it is essentially never finished – there are always new sources to add, new types of analysis the customers want etc (at least there are if the project is a success: if it failed then at least you won’t have too many of those pesky customer enhancement requests).

As the article points out, a data warehouse project is ideal for an iterative approach to development. The traditional “waterfall” approach whereby the requirements are documented at ever greater levels of detail, from feasibility through to requirements through to functional specification etc is an awkward approach. I have observed that in some companies the IT departments have a rigid approach to project management, demanding that all types of projects follow a waterfall structure. This is unfortunate in the case of data warehouse projects, where end-users are often hazy on requirements until they see the data, and where changing requirements will inevitable derail the neatest functional specification document (see diagram).
Given a 16 month average elapsed time for a data warehouse project (TDWI) it is almost certain that at least one, and possibly several, major changes will come along that have significant impact on the project, which in a waterfall approach will at the very least cause delays and may put the entire project at risk.

By contrast a data warehouse project that bites off scope in limited chunks, while retaining a broad and robust enterprise business model, can deliver incremental value to its customers, fixing things as needed before the end users become cynical, and gradually building political credibility for the warehouse project. Of course the more responsive to change your data warehouse is the better, but even for a traditional custom build it should be possible to segment the project delivery into manageable chunks and deliver incrementally. The data warehouse projects which I have seen go wrong are very often those which have stuck to a rigid waterfall approach, which makes perfect sense for a transaction processing system (where requirements are much more stable) but is asking for trouble in a data warehouse project. Ken Pohl’s article contains some useful tips, and is well worth reading.

A data warehouse is not just for Christmas

A brief article by Bill Inmon addresses a key point often overlooked – when is a data warehouse finished? The answer is never, since the warehouse must be constantly updated to reflect changes in the business e.g. reorganizations, new product lines, acquisitions etc.

Yet this is a problem because today’s main data warehouse design approaches result in extremely high maintenance costs – 72% of build costs according to TDWI. If a data warehouse costs USD 3M to build and USD 2.1M to maintain annually then over five years you are looking at costs well over USD 11m (let’s generous allow a year to build plus four years of maintenance) i.e. many times the original project cost. These levels of cost are what the industry has got used
to, but these are very high compared to maintenance costs for OLTP systems, which ttypically run at 15% of build costs annually. This high cost level, and the delays in responding to business change when the warehouse schema needs to be updated, contribute to poor perception of data warehouses in the business community, and high perceived failure rates. As noted elsewhere, data warehouses built on generic design principles are far more robust to business change, and have levels of maintenance around 15%.

If the data warehouse industry (and the business intelligence industry which feeds on it) is to continue to grow then it needs to grow up also, and address the issue of better data warehouse design paradigms. 72% annual maintenance costs are not acceptable.

Desperate Data Warehouses

A Gartner Group report mentions that at least 50% of data warehouse projects fail. Of course on its own this sounds bad, but just how bad is it, and what is meant by failure e.g. is being one month late failure, or does it mean complete failure to deliver? How do IT projects in general do? Standish Group run a fairly exacting survey which in 2003 covered 13,522 IT projects, a very large sample indeed. Of these just 34% were an “unqualified success”. Complete failure to deliver were just 15%. The rest are in the middle i.e. they delivered but were not perceived to be complete successes in some way. To be precise: 51% had “cost overruns, time overruns, and projects not delivered with the right functionality to support the business”. Unfortunately the Gartner note does not define “failure” as precisely as Standish; they define the “over 50% as being “limited acceptance or be outright failures”. It is also unclear whether the Gartner figure was a prediction based on hard data, or the opinion of one or more of their analysts.

The Standish study usefully splits the success rate by project size, with a miserable 2% of projects larger than USD 10M in size being complete successes, with 46% of projects below USD 750k being complete successes, 32% up to USD 3M and, 23% at USD 3-6M and 11% at USD 6-10M. The average data warehouse project is somewhere around the USD 2-5M range, with USD 3M often quoted, so indeed on this basis it would seem we should only expect around 25% or so to be “unqualified successes”. Unfortunately I don’t have data available for the failure rate split by size, which presumably may follow a similar pattern, and the rather loose definition that Gartner use makes it hard to compare like with like.

Even if turns out that data warehouse projects aren’t any (or at least much) worse than other IT projects, this is not a great advert for the IT industry. The Standish data most certainly gives a clear message that if you can possible reduce the scope of a project to smaller, bite-sized projects, then you greatly enhance your chance of success. It has long been known that IT productivity drops as projects get larger. This is due to human nature – the more people you have to work with, the more communication is needed, the more complex things become, and the more chance of things being misunderstood or overlooked.

It is interesting that even very large data warehouse projects can be effectively managed in bite-sized chunks, at least if you use a federated approach rather than trying to stuff the entire enterprise’s data into a single warehouse. Projects at BP, Unilever, Philips, Shell and others have taken a country by country approach, or a business line by business line approach, with individual warehouses feeding up to either regional ones or a global one, or indeed both. In this case each project becomes a fairly modest implementation, but there my be many of them. The Shell OP MIS project involved 66 separate country implementations, three regional and one global. Overall a USD 50M project, but broken down into lots of manageable, repeatable pieces.

So, if you data warehouse project is not to become desperate, think carefully about a federated architecture rather than big bang. This may not always be possible, but you will have a greater chance of success.

MDM Business Benefits

There were some interesting results in a survey of 150 big-company respondents conducted by Ventana Research as to where customers saw the main benefits of master data management (MDM). The most important areas were:

  • better accuracy of reporting and business intelligence 59%
  • improvement of operational efficiency 27%
  • cost reduction of existing IT investments 8%

It is encouraging that respondents place such a heavy emphasis on business issues compared to IT, since quite apart from this sounding quite right (MDM can improve customer delivery errors, billing problems etc) they will have a much better chance of justifying an MDM project if the benefit case is related to business improvement than the old chestnut of reduced IT costs (which so rarely appear in reality – surely IT departments would have shrunk to nothing by now if all the projects promising “reduced IT costs”over the years had actually delivered their promised benefits). A nice example of how to justify an MDM project can be found in a separate article today, in this example specifically about better customer information.

The survey also reflects my experience of the evolution of MDM initiatives, which tend to start in a “discovery”phase where a company takes stock of all its master data and begins to fix inconsistency, which initially impact analytic and reporting applications. Later, after this phase, companies begin to address the automation of the workflow around updating master data, and finally reach the stage of connecting this workflow up to middleware which will physically update the operational systems from a master data repository. This last phase is where many of the operational efficiency benefits will kick in, and these may be very substantial indeed.

Based on the rapidly increasing level of interest in MDM, in 2006 I expect to see a lot of the current exploratory conversations turning into more concrete projects, each of which will need a good business case. At present MDM projects tend to be done by pioneering companies, so it will be very interesting to see if the various projections prove accurate and MDM starts to become more mainstream.

Nine women can’t produce a baby in a month

The software industry is not good at learning from previous lessons and mistakes. We seem to re-invent the wheel at fairly regular intervals, perhaps because a lot of people working in the technology industry are quite young and perhaps because we assume that anything done ten or more years ago is inherently outdated. One area I regularly observe this collective blind spot is in estimating and project management. Software projects have a poor track record of coming in on time and budget, and this has a number of causes. One is unrealistic expectations. A wise aeronautical engineer once said: “better, cheaper, faster: pick any two, but never all three” yet we all still encounter projects where the end date seems to be set by some surreal remit (“the CEO wants it in time for Christmas” syndrome) without regard to the feasibility or effect on the project deliverable. Moreover, when a project does hit problems, as all too many do, there still seems to be the impression that throwing more resources at it will claw back the time lost. Sadly this is rarely the case.

There is some useful theory on this subject. A number of writers in the 1980s published algorithms to help estimate project duration and team size, based on the observation of many software projects. You can read more about this subject in books such as “Controlling Software Projects“by Tom deMarco, Software Engineering Economics by Barry Boehm and Measures for Excellence by Putnam and Myers. However these sources agree that the evidence is that in order to bring a project end-date forward you need to deploy exponentially more resources to do that. The theory actually shows an equation relating the elapsed time to effort as:

effort = constant/time to the power four

which for the less mathematically inclined means that the end date (project time) has a BIG effect on the effort needed. For example a project that was estimated at 18 months elapsed (a nice round number selected by IT management) could, if it was extended to 19 months, be done with 20% less effort. That’s right: by extending your elapsed time by 5% you need 20% less effort. When I first saw this it seemed almost absurd, until I got involved briefly in project estimating when I worked at Shell and observed two projects in the upstream. They were that rare thing: the same project, but being done in different Shell subsidiaries. They were independent, and were the same size and scope. One project was estimated to 13 months, the other was to take the same, but in one project a decision was taken to bring forward the elapsed time to 12 months in order to fit in with another project. Money was not a major factor on this particular project, and more resource was piled in to bring the date forward. Remarkably, the compressed project took 50% more effort to bring in than the one which ran its “natural” course, something that caused general bewilderment at the time but one which actually fits in tolerably well with the software equation above.

Why is the effect so dramatic? By adding new people to a project, people that were working on the project have to stop what they were doing an on-board the new people. Now there is a bigger team, the communication becomes trickier, as more people need to be involved and the project specification is open to interpretation by more people. As you add more and more people the problem worsens: now people don’t fit into one room any more, so they have to have a meeting in order to solve a problem rather than just turning to their neighbor etc.

The message is that if you have a project that is having problems with its schedule, it is much easier to reduce the scope of the project slightly, and deliver the rest of the functionality in a later phase, than it is to try and pile on more resources to cram it into the original schedule. If you can’t reduce the scope of the project then you can only make the people on project more productive (good luck with that) or add a LOT of resources. At least there is a formula you can look up to tell you many more resources you need, even if management won’t like the answer.

“You want a system to do what now?”

Tony Lock writes an excellent article in this week’s IT-Director.com newsletter, highlighting the communication gap between IT departments and their customers. A new survey by Coleman-Parkes found that amongst 214 FTSE 750 organizations, only 18 percent held weekly meetings between business managers and the IT teams. The research also indicated that 31 percent of those surveyed claim that they never or hardly ever have such meetings. In large corporate IT departments there can be a culture of avoiding contact with “users”, who always seem to have strange and unreasonable demands that don’t fit into the perceptions of the IT department. The atmosphere can become quite hostile if IT departments set themselves up as “consultancy” organizations that charge themselves out to their internal customers. The internal customers resent being forced to use an internal service that they often perceive as unresponsive, and can be outraged to find themselves being charged at similar rates to external service providers. Some of this is not reasonable – those same customers are forced to use internal legal counsel and are charged through the nose, whether or not they like it. However there is a peculiar frustration with many business users over their IT departments that can boil over when discussing charge-back mechanisms and service level agreements.

Over-elaborate internal billing systems can cause unnecessary cost and frustration. I recall when I was at Exxon seeing an instructive project to review internal data centre charges. The existing system was extremely elaborate and charged based on mainframe usage, disk storage, communications costs and a whole raft of items. Most users didn’t understand half the items on their bills, or played games to try and avoid hitting arbitrary pricing thresholds. None of this added one iota of revenue to Exxon. The project manager, a very gifted gentleman called Graham Nichols (on his way rapidly up the organization), successfully recommended replacing the entire system with a single charge once per year. This saved a few million pounds in administration and endless arguments, and people’s tempers were much improved all round.

Perhaps some of the problem is when an organization grows very large, it is difficult to keep perspective. Shell employed around 10,000 IT staff in the 1990s, directly or indirectly, so it perhaps not surprising that the IT staff concentrated on their own internal targets and objectives, rather than troubling themselves too much to align themselves with the objectives of the core energy business. At a time when the oil industry was struggling with oil prices heading down towards 10 dollars, and so with serious cost-cutting going on all round, the internal IT group, living through the internet boom, was hiring to keep up with demand e.g. dealing with the Y2K problem. Seeing redundancies going on in engineering and marketing at the same time as a hiring boom in internal IT, tempers became frayed, to put it mildly.

Clearly senior internal IT staff do need to spend more time with their business customers, and find out how they can help them achieve their objectives. Moreover they need to communicate this throughout their organizations. How many internal IT staff know the top three business objectives of their company this year? Without even a vague idea of the goals that the business is pursuing, it is hardly surprising that business leaders become frustrated with internal IT groups. Those 31% of internal IT groups who never or hardly ever meet with their customers need to change this attitude or get used to living on a Bangalore salary in the future.

The Need for Real Business Models

One of the perennial issues that dogs IT departments is the gap between customer expectations and the IT system that is actually delivered. There are many causes of this e.g. the long gap between “functional spec” and actual delivery, but one that is rarely discussed is he language of the business model. When a systems analyst specifies a system they will typically draw up a logical data model and a process model to describe the system to be built. The standard way of doing the former is witti entity relationship modelling, which is well established but has one major drawback in my experiences: business people don’t get it. Shell made some excellent progress in the 1990s at trying to get business people to agree on a common data model for the various parts of Shell’s business, a thankless task in itself. What was interesting was that they had to drop the idea of using “standard” entity relationship modelling to do it, as the business people at Shell just could not relate to it.

At that time two very experienced consultants at Shell, Bruce Ottmann and Matthew West, did some ground-breaking research into advanced data modelling that was offered to the public domain and became ISO standard 15926. One side effect of the research was a different notation to describe the data used by a business, which turned out to be a lot more intuitive than the traditional ER models implemented in tools like Erwin. This notation, and much else besides is described in an excellent whitepaper by Bruce Ottmann (who is now with Kalido).
We use this notational form at Kalido when delivering projects to clients as diverse as Unilever, HBOS and Intelsat, and have found it very effective in communicating between IT and business people. The notation itself is public domain and not specific to Kalido, and I’d encourage you to read the whitepaper and try it out for yourself.