If all you have is a hammer…

Claudia Imhoff raises an important issue in her blog regarding the cleansing of data. When dealing with a data warehouse it is important for data to be validated before being loaded into the warehouse in order to remove any data quality problems (of course, ideally you would have a process to go back and fix the problems at source also). However, as she points out, in some cases e.g. for audit purposes, it is actually important to know what the original data actually was, not just a cleansed version. This issue gets at the heart of a vital issue surrounding master data, and neatly illustrates the difference between a master data repository and a data warehouse.

In MDM it is accepted (at least by those who have experience of real MDM projects) that master data will go through different versions before producing a “golden copy”, which would be suitable for putting into a data warehouse. A new marketing product hierarchy may have to go through several drafts and levels of sign-off before a new version is authorised and published, and the same is true of things like drafts of budget plans, which go through various iterations before a final version is agreed. This is quite apart from actual errors in data, which are all too common in operational systems. An MDM application should be able to mange the workflow of such processes, and have a repository that is capable of going back in time and tracking the various versions, not just the finished golden copy. A good MDM repository should allow you to track back through master data as it is “improved” over time, not just look at the golden copy. The golden copy only should be exported to the data warehouse, where data integrity is vital.

People working on data warehouse projects may not be aware of such compliance issues, as they usually care only about the finished state warehouse data. MDM projects should always be considering this issue, and your technology selection should reflect the need for your MDM technology to track versions of master data over time.

Has the fizz gone out of Cognos?

Cognos’ latest quarterly results were rather a flat affair. Licence revenue is just 3% up year over year, while quarterly revenue was respectable at USD 237M, but what growth there was came mainly from product support fees (up 13%) and services (up 10%) which is less than an ideal mix for a software vendor. On the positive side, an operating margin of 12.6% is quite good, and up significantly from the 9.6% of last year.

Less good was that there were 7 deals over USD 1 million, down from 13 the same time last year. Europe and Asia did better than the US. It seems that the financial applications business is doing better than the traditional core BI tools business, which is rumoured to be shrinking.

Overall these are certainly not bad results, but in a fairly healthy BI market they are hardly sparkling, which seems to be reflected by a dip in the share price.

Master data: from jungle to garden in several not so easy steps

I very much liked a succinct article by the ever-reliable Colin White on MDM approaches. Companies still struggle to get to grips with what a roadmap for MDM is all about, with apparently competing (and incomplete and immature) MDM technologies and management consultants who are only a few pages ahead of the customers in the manual. This piece neatly sets out the end goal of MDM and the various approaches to getting there (via analytic MDM or operational MDM as a start). It would have been even better had it explained in more detail how the alternatives can be run in parallel, and going into more depth on the issues of each sequences of steps. However by clearly separating out operational and analytic MDM and showing how these are complementary he is already doing a significant service.

The issue he mentions with “approach 1” i.e. the “complexity of maintaining a complete historical record of master data” can be dealt with if you choose an analytic MDM technology which has built-in support for analysis over time. Colin points out that a key step is to end up with a low-latency master data store as the system of record for the enterprise, acting as a provider of golden copy master data to other sources, both transaction systems and analytical ones such as an enterprise data warehouse. If properly implemented, this will result in a change of the centre of gravity of master data, from the current situation where the system of record is ERP to a situation where the enterprise master data repository is actually the system of record, providing data through a published interface (and an enterprise service bus) through to all other systems, including ERP. This is a desirable end state, and is a key step to starting to unlock the monolithic ERP systems that companies use today into more manageable components.

I really hope that this paper gets the attention that it deserves. Getting most of the key messages into two page article is quite an achievement. I would like to see this developed further, and hopefully it will be.

I see a tall dark stranger in your future….

There is an interesting article in CIO Insight by Peter Fade, a professor of marketing at the top-rated Wharton Business School. in this he discusses the limitations of data mining, and it is an article that anyone contemplating investing in this technology should read carefully. I set up a small data mining practice when I was running a consulting division at Shell, and found it a thankless job. Although I had an articulate and smart data mining expert and we invested in what at the time was a high quality data mining tool, we found time and again that it was very hard to find real-world problems where the benefits of data mining could be shown. Either the data was such a mess that little sense could be made of it, or the insights shown by the data mining technology were, as Homer Simpson might say, of the “well, duh” variety.

Professor Faber argues that in most cases the best you can hope for is to develop simple probabilistic models of aggregate behaviour, and you simply cannot get down to the level of predicting individual behaviour using the level of data that we typically have, however alluring the sales demonstrations may be. Moreover, such models can mostly be built in Excel and don’t need large investments in sophisticated data mining tools.

While I am sure there are some very real examples where data mining can work well e.g. why some groups of people are better credit risks than others, the main point he makes is that the vision of 1-1 marketing via a data mining tool is a fantasy, and that the tools have been seriously oversold. Well, that is something that we in the software industry really do understand. We all want technology to provide magical insights into a messy and complex world that is hard to predict. Unfortunately the technology at present is generally as useful as a crystal ball when it comes to predicting individual behaviour. Yet there is still that urge to go into the tent and peer into the mists of the crystal ball in search of patterns.

Dot Bomb 2.0?

Currently enterprise software companies have been trading at around three times revenues, with premiums for particularly good firms up to about five times revenues, less for firms that are not showing much market progress. There have been several examples of this type of deal recently. Enterprise software CEOs could be forgiven for a casting an envious eye at the internet software market. On the UK market AIM a company called Blinkx, which offers the ability to search video clips (using technology from Autonomy) recently raised money, with its first day trading giving a valuation of GBP 180 million. Any guesses to the revenues or profitability of this company? Revenues of GBP 60 million perhaps, maybe as low as GBP 40 million? Nope. Revenues are expected to be just over GBP 2 million in 2007. Profits? “Profitability is not expected until 2010”.

How about the teenage scribblers who presumably can explain this kind of valuation? According to an analyst at Dresdner Kleinwort: “It is hard to value because we don’t know what it is going to focus on. There’s no proven management history, and few historical numbers to play with”.

Does this kind of language ring any bells? Does anyone recall a time in the far distant past when companies could not be valued using “old fashioned” methods like price/sales or price/earnings, since the internet was a new business model? Maybe you were wiser than me, but I admit to to buying some shares in basket cases like Commerce One at the height of the bubble, believing the previous generation of teenage scribblers that the internet “changed everything” and old fuddy duddies who fretted about irrational exuberance just “didn’t get it”.

Those who cannot remember the past are doomed to repeat it. For the latest lesson we don;t have to think back to the South Sea Bubble or the Amsterdam Tulip fiasco. We just have to cast our minds back six years or so. I for one will not be investing in Blinkx.

Footnote. After writing this I found a thoughtful blog on the same subject.

The other shoe drops

For sometime I had been wondering which company Microsoft would buy to enter the MDM market. This is a key area in the broader business intelligence arena that they aspire to progress in, and was a major gap in their offering. Stratature was their choice, and it was a smart choice. Stratature plays in the analytical MDM area rather than being an operation transaction hub (like Siperian, say). It had built up a good reputation for flexible hierarchy management, an important feature of most MDM applications. They competed directly with Razza (an excellent tool which Hyperion purchased but Oracle seems to have now buried) and Kalido.

Stratature is the kind of bite-sized (16 employees) acquisition that Microsoft likes. It prefers to catch a company when it is small so that it can easily absorb the technical staff and mould them into the Microsoft way of doing things. When it has deviated from this rule (Great Plains, Navision) it has discovered why this was a good rule in the first place.

Congratulations to Ian Ahern, who impressed me on the several occasions I met with him. He also supports my (possibly biased) thesis that all the best MDM people are Brits. The terms of the deal are not public, and it would have been interesting to see what valuation a good MDM vendor achieved; I am sure it worked out well for Stratature’s shareholders. This now leaves Kalido as the main remaining independent analytic MDM vendor. This is not necessarily a bad thing for Kalido. Informatica has shown how you can thrive once your competitors get swallowed by the behemoths. Being stack-neutral in data management carries advantages.

The good old days

I attended an interesting talk today by Greg Hackett, who founded financial benchmarking company Hackett Group before selling this to Answerthink and “going fishing for a few years”. He is now a business school professor, and has been researching into company performance and, in particular, company failure. Studying the 1,000 largest US public companies from 1960 to 2004 his research shows:

– company profitability is 40% lower in 2004 than in 1960, with a fairly steady decline starting in the mid 1960s
– the average net income after tax of a company in 2004 was just 4.3%
– half of companies were unprofitable for at least two out of five years
– 65% of those top 1,000 companies in 1965 have disappeared since, with just half being acquired but 15% actually going bankrupt.

He gave four reasons for company failure: missing external changes in the market, inflexibility, short term management and failing to use systems that would show warning signs of trouble. What I found most surprising was that the correlation between profitability and stock market performance was zero.

The research suggests that the world is becoming a more competitive place, with pricing pressure in particular reducing profitability despite greater efficiency (cost of goods sold is 67% of turnover, down from 75% in 1960, though SG&A is up from around 13% or turnover to around 18%). All those investments in technology have made companies slightly more efficient, but this has been more than offset by pricing pressure.

I guess this also tells you that holding a single blue chip stock and hanging onto it is a risky business over a very long time; with 15% of companies folding over that 45 year period, it pays to keep an eye on your portfolio.

A key implication is that companies need to get better at implementing management information systems that can react quickly to change and help give them insight into competitive risks, rather than just monitoring current performance. Personally I am unsure that computer systems are ever likely to provide sufficiently smart insight for companies to take consistently better strategic decisions e.g. divesting from businesses that are at risk; even if they did, would management be smart enough to listen and act on this information? It does imply that systems which are good at handling mergers and acquisitions should have a prosperous future. This is one thing, at least, that seems to have a growing future.

Master data initiatives need co-ordination

A generally good article by Colin Beasty about CDI shows a common misconception regarding data warehousing. The article rightly points out that CRM (via Siebel etc) essentially failed to resolve the “single versions of the truth” about customer, with apparently 20-40 systems in a large company having customer data (this sounds plausible but he doesn’t quote a source of this). However he says that data warehouses can’t address this since “data integrity and validity are optional”. Here he seems to be mixing up an operational data store and a data warehouse, or at least a good data warehouse. An operational data store might well be a dump of data straight from a transaction system without work being done on the data (purely for performance reasons) but a data warehouse should definitely not be. A data warehouse is supposed to be pulling together data from multiple systems and providing a single, consistent view across the enterprise. It cannot do that without having a stage of validation of data, rejecting data that is inconsistent with the company’s business rules. If not, it is a case of “garbage in, garbage out”. Now certainly, if you have a source of customer data that is a well implemented CDI hub, rather than several sources (an ERP system, a CRM system etc) then essentially the CDI hub has carried out the validation and resolution stage already i.e. it is acting as a single system of record for customer data. However the warehouse cannot relax, since it also has to deal with all the other kinds of transaction and master data as well. Indeed, I would argue that a hub-based approach carries with it some dangers. If you implement a CDI hub, then do the same for product using a PIM solution, then you will realise that you need another hub for employee, asset, etc. CDI hub technology typically does not handle other types of master data as it is hard coded around the (important) class of master data called customer.

The article acknowledges that CDI is a subset of MDM, but does not draw attention to the danger of a piecemeal hub implementation one datatype at a time. What is needed is a master data repository that can act as a system of record for all types of master data, itself feeding both data warehouses and other systems (possibly via SOA as the article mentioned, but that is essentially optional). Without this realisation we are in danger of creating yet another set of master data sources without really getting to the heart of the issue. You can have multiple hubs, but somewhere you need a single repository which at least knows where every version of master data is in the enterprise, whether in hubs, ERP or elsewhere; better still if that MDM repository can act as an active provider of master data elsewhere, since it will have the enterprise-wide business rules needed to ensure data quality, which systems closer to operational processes may not have. Without a fully integrated approach to master data we are in danger of just adding unnecessary duplicate sources of master data (since these data are, after all, not going away in the ERP systems). Somewhere a true “master of master data” needs to exist, and that needs to be owned by business people with the authority to resolve inter-department disputes over master data (and not just customer data). Otherwise we are just adding another layer to the spaghetti.