Andy on Enterprise Software

Blowing Bubbles

November 15, 2007

Back in the late 1990s companies filed for IPOs even though they had modest revenues and were losing money. Due to the tulip mentality of the time investors suspended disbelief and bought in anyway, giving way to the crash of 2001. A couple of years after that bankers were telling me that in order to have an IPO you would need “at least a couple of years of solid trading profits”, quarterly revenues of at least $25 million and preferably more, as well as strong growth. Those heady days of the late 1990s were a freak occurrence, like the South Sea Bubble. Certainly technology IPOs dried up almost entirely.

With the recent gloom on Wall Street I was therefore surprised to see Initiate Systems filing for an IPO. They are growing quite rapidly but not only have never made a cent of profit, but their losses appear to be, if anything, widening slightly at about a third of their revenues. Throw in an admitted financial misstatement and does this start to feel to you like the late 1990s again? No doubt Initiate is expertly and expensively advised, but this will certainly be one to watch, as if the IPO goes ahead and well then it will change perceptions of exit strategies for high tech companies.

Pure and chased

November 7, 2007

Purisma has been acquired by Dun & Bradstreet, the business information company that provide, amongst other things, assessment of credit risk of companies and company statistics. On the face of it this is a somewhat peculiar acquisition, since D&B is not a pure provider of enterprise software solutions in the way that, say Oracle, is. However D&B did have its own data quality offering (clearly data quality is a big issue for an information supplier) and Pursima’s customer hub technology is certainly complementary to this data quality offering. It seems possible that D&B has bought Purisma primarily for its own internal purposes, and at this point it is unclear whether Purisma will even continue to be sold as a product in its current form. Rather ironically, Purisma had a product offering allowing integration of D&B into its CDI application. I guess that will come in handy now.

Purisma does not publish public financial data, so it is tricky to tell whether how good or bad the price paid of USD 48 million for the company was. I believe that Purisma had less than 50 employees and I would speculate that its revenues were in the USD 15-20M range. In general it is known that stand-alone CDI and PIM players have been struggling somewhat in the market. This is part due to a gradual dawning on customers that master data management is a broader topic than just “customer” or “product”, a long term theme of this blog. When customers ask “ah, but what about other kinds of master data” (asset, location, employee etc) then specialist CDI and PIM vendors do not have good answers, however good their offerings in their particular domains are. Even IBM has done an about turn on this topic recently, laying out a roadmap for a single MDM Server that will eventually bring together its menagerie of acquired technologies into a platform that will handle multiple master data domains consistently. For this reason I suspect that D&B did not pay over the odds for Purisma.

D&B has had phases in the past of buying software companies, and then moving away from this business e.g. those with long memories will recall the 4GL Nomad, which it sold off after some years. The press release that is tucked away on the Purisma web site today is not giving anything away. If press releases played poker, this one would be a tough player. Purisma customers need to seek guidance from D&B about its future intentions, and consider their alternatives.

Common sense starts to prevail

October 30, 2007

Regular readers of this blog are probably tired of hearing about me advocating that MDM vendors need to move beyond single domain solutions (CDI, PIM) into solutions that can cater for a wide range of master data types. I have spoken at a number of the very useful CDI/MDM Institute (previously CDI Institute) conferences organised by Aaron Zornes, which are pretty much the only MDM conferences out there, and initially (as indicated by its earlier name) Aaron seemed fairly sceptical about this message. It is therefore encouraging to see him starting to lean this way in an article in DM Review. In the article he bases this view on multiple conversations with people responsible for MDM at large enterprises.

This is quite right; perhaps I had this view initially because I used to be a technology strategist at Shell and so was trained to think this way, but it has always seemed blindingly obvious to me that single domain solutions are at best a sticking plaster when it comes to MDM. There are simply too many classes of master data to contemplate fragmenting MDM solutions by domain, each to a potentially different vendor. Large companies don’t like dealing with more vendors than they have to, and common sense tells you that it is easier to get economies of scale in terms of skill sets. never mind software licenses, by using technology that is capable of dealing with all kinds of master data in the same way. Personally I would be cautious above vendors who bolt on wider domain capability to existing technologies that were initially hard coded around a specific domain such as customer or product. It is never easy to re-architect software to do something that its original designers never intended. It will be easier for the pure play generic MDM vendors to add better performance etc than it will be for a CDI vendor to be genuinely able to deal with multiple domains consistently.

Having already changed the name from “CDI Institute” to CDI/MDM Institute” it’s only three letters away from ending up with the “MDM Institute”.

Data mis-governance

October 22, 2007

I spent this morning at a data governance seminar sponsored by Dataflux, at which Jill Dyche or Baseline Consulting spoke about her experiences of data governance best practice in client organisations, and Philip Howard of Bloor gave his perspective. Data governance seems to be something very much in its infancy despite the long-established issues it addresses, with only a tiny proportion of organisations having made a lot of progress (according to an IBM Global Services 5 point data governance maturity scale, no company is further along than stage three, and only a handful of companies even manage that). There seems little in the way of a silver bullet here, just missionary work to convince the business that data ownership needs to be taken seriously. Sometimes a “burning platform” can stimulate interest. Recently Nationwide Building Society was fined GBP 1 million due to the theft of a laptop on which customer data was stored (albeit in encrypted form). Interestingly, the fine was not directly due to the loss of the data but the fact that they had no processes in place to determine that there was actually customer data on the laptop. Such cases illustrate the risks, at least in regulated industries, of having poor data governance polices.

Another aspect of data governance often overlooked is the proliferation of data in corporate spreadsheets. Apparently Allied Irish Bank have a stunning 185 TB of storage devoted to spreadsheets alone, and who knows how much of this is duplicated. With studies showing that, in a spreadsheet with over 200 rows there is a 90% chance of an error, the potential for problems is self evident. When I was at Shell there was a whole group on the corridor opposite me who built spreadsheet models and audited existing ones, some of which are highly important (e.g. financial models used for capital intensive projects). This group paid its way many times over by uncovering flaws in existing operational models. Yet I suspect they only scratched the surface, and how common are such initiatives? This should be a promising area for companies such as Compassoft, which do spreadsheet “discovery and control”. Indeed there are no shortage of scandals related to manipulation of spreadsheets, including the USD 700M one at Allied Irish. And you thought you had enough data quality problems in your corporate systems….

The murky world of market sizing

August 9, 2007

Defining a software segment’s market size is a tricky thing, partly because is all about what you include and what you exclude. Take MDM as an example. A much quoted IDC figure reckoned the MDM market would be USD 10 billion in 2009, implying a USD 5 billion market size in 2005 given compound growth of 14%. Such figures are regularly bandied about by the computer press, but mean little unless you qualify such statements by explaining what is included or excluded. For example this figure includes an estimate for services business associated with MDM. This is itself hard to pin down, but in my experience an MDM project where the software costs X will spend about 3X on services to implement it. Hence that USD 5 billion market size actually only has about USD 1.6 billion of software sales. Then MDM itself is a broad church, including CDI and PIM as well as a generalist MDM solutions such as those from Orchestra Networks and Kalido. I was still puzzled as to why even this USD 1.6 billion figure number was so large, but by deduction I think that the IDC figure was including data quality within the picture also. Fair enough, but it needs to be explicitly stated to make sense of the market, and as we will see still does not explain the gap.

Let’s come at this another way. A Gartner figure just released reckoned the CDI market was worth USD 310 million in 2006. This appears to be an estimate for software rather than services. Getting a figure for the product information management market is murkier, but I believe it will be broadly at a similar level. The generalist MDM vendors are these days mostly from smaller companies (products like Razza having been swallowed and digested by Oracle, and Stratature by Microsoft for example) and I doubt would add USD 100 million in software sales to this picture. Hence, adding PIM + CDI + specialist MDM (but excluding data quality) you get a software market of maybe USD 700M (probably a bit less), which is a far cry from the apparent IDC figure of USD 5 billion, or even the likely USD 1.6 billion of software revenues only. I still struggle to bridge the gap here, as the data quality market is not that large. Again you have to be careful about what is in and what is out, but other than leader Trillium data quality vendors are mostly very small (e.g. Exeros, Datanomic, etc) or are now buried within larger companies through acquisition (e.g. Informatica, Business Objects). However though I have seen estimates like USD 500M for the data quality market, again I wonder how much of this is services; personally I am unconvinced that the software sales of the data quality market would be much beyond USD 100M or so (companies like FirstLogic were not that large prior to their acquisition). So if we take the USD 700M figure and throw in USD 150M for data quality software sales (let’s be generous) this is still a far cry from the USD 1.6 billion estimate we arrived at earlier. Of all the analyst firms I respect the market size figures from IDC best, as they do actually check with the vendors what their revenues really are (they used to do this every year when I was running Kalido) but as you can see their MDM market size figure is still a mystery to me. If someone from IDC is reading this and can shed some light on it I would be interested to hear from them.

MDM is certainly growing quickly: each analyst firm agrees on this, and is clear enough from the number of companies entering the market or (more commonly) re-labelling existing products as MDM. However it can be seen that you can take a figure like the IDC 5 billion number, and also produce a valid market estimate of under USD 850 million, just based on what you include or exclude, for seemingly the same market. Quite a range. I guess it is hoping too much to expect the IT press to actually mention pesky caveats like what a number includes, since it is more headline inducing to say “MDM market worth $5 billion”, but if you are to actually use these figures to help with a decision then you would be well advised to dig deeper, below the headline numbers.

If all you have is a hammer…

June 29, 2007

Claudia Imhoff raises an important issue in her blog regarding the cleansing of data. When dealing with a data warehouse it is important for data to be validated before being loaded into the warehouse in order to remove any data quality problems (of course, ideally you would have a process to go back and fix the problems at source also). However, as she points out, in some cases e.g. for audit purposes, it is actually important to know what the original data actually was, not just a cleansed version. This issue gets at the heart of a vital issue surrounding master data, and neatly illustrates the difference between a master data repository and a data warehouse.

In MDM it is accepted (at least by those who have experience of real MDM projects) that master data will go through different versions before producing a “golden copy”, which would be suitable for putting into a data warehouse. A new marketing product hierarchy may have to go through several drafts and levels of sign-off before a new version is authorised and published, and the same is true of things like drafts of budget plans, which go through various iterations before a final version is agreed. This is quite apart from actual errors in data, which are all too common in operational systems. An MDM application should be able to mange the workflow of such processes, and have a repository that is capable of going back in time and tracking the various versions, not just the finished golden copy. A good MDM repository should allow you to track back through master data as it is “improved” over time, not just look at the golden copy. The golden copy only should be exported to the data warehouse, where data integrity is vital.

People working on data warehouse projects may not be aware of such compliance issues, as they usually care only about the finished state warehouse data. MDM projects should always be considering this issue, and your technology selection should reflect the need for your MDM technology to track versions of master data over time.

Master data: from jungle to garden in several not so easy steps

June 20, 2007

I very much liked a succinct article by the ever-reliable Colin White on MDM approaches. Companies still struggle to get to grips with what a roadmap for MDM is all about, with apparently competing (and incomplete and immature) MDM technologies and management consultants who are only a few pages ahead of the customers in the manual. This piece neatly sets out the end goal of MDM and the various approaches to getting there (via analytic MDM or operational MDM as a start). It would have been even better had it explained in more detail how the alternatives can be run in parallel, and going into more depth on the issues of each sequences of steps. However by clearly separating out operational and analytic MDM and showing how these are complementary he is already doing a significant service.

The issue he mentions with “approach 1″ i.e. the “complexity of maintaining a complete historical record of master data” can be dealt with if you choose an analytic MDM technology which has built-in support for analysis over time. Colin points out that a key step is to end up with a low-latency master data store as the system of record for the enterprise, acting as a provider of golden copy master data to other sources, both transaction systems and analytical ones such as an enterprise data warehouse. If properly implemented, this will result in a change of the centre of gravity of master data, from the current situation where the system of record is ERP to a situation where the enterprise master data repository is actually the system of record, providing data through a published interface (and an enterprise service bus) through to all other systems, including ERP. This is a desirable end state, and is a key step to starting to unlock the monolithic ERP systems that companies use today into more manageable components.

I really hope that this paper gets the attention that it deserves. Getting most of the key messages into two page article is quite an achievement. I would like to see this developed further, and hopefully it will be.

The other shoe drops

June 8, 2007

For sometime I had been wondering which company Microsoft would buy to enter the MDM market. This is a key area in the broader business intelligence arena that they aspire to progress in, and was a major gap in their offering. Stratature was their choice, and it was a smart choice. Stratature plays in the analytical MDM area rather than being an operation transaction hub (like Siperian, say). It had built up a good reputation for flexible hierarchy management, an important feature of most MDM applications. They competed directly with Razza (an excellent tool which Hyperion purchased but Oracle seems to have now buried) and Kalido.

Stratature is the kind of bite-sized (16 employees) acquisition that Microsoft likes. It prefers to catch a company when it is small so that it can easily absorb the technical staff and mould them into the Microsoft way of doing things. When it has deviated from this rule (Great Plains, Navision) it has discovered why this was a good rule in the first place.

Congratulations to Ian Ahern, who impressed me on the several occasions I met with him. He also supports my (possibly biased) thesis that all the best MDM people are Brits. The terms of the deal are not public, and it would have been interesting to see what valuation a good MDM vendor achieved; I am sure it worked out well for Stratature’s shareholders. This now leaves Kalido as the main remaining independent analytic MDM vendor. This is not necessarily a bad thing for Kalido. Informatica has shown how you can thrive once your competitors get swallowed by the behemoths. Being stack-neutral in data management carries advantages.

Master data initiatives need co-ordination

June 3, 2007

A generally good article by Colin Beasty about CDI shows a common misconception regarding data warehousing. The article rightly points out that CRM (via Siebel etc) essentially failed to resolve the “single versions of the truth” about customer, with apparently 20-40 systems in a large company having customer data (this sounds plausible but he doesn’t quote a source of this). However he says that data warehouses can’t address this since “data integrity and validity are optional”. Here he seems to be mixing up an operational data store and a data warehouse, or at least a good data warehouse. An operational data store might well be a dump of data straight from a transaction system without work being done on the data (purely for performance reasons) but a data warehouse should definitely not be. A data warehouse is supposed to be pulling together data from multiple systems and providing a single, consistent view across the enterprise. It cannot do that without having a stage of validation of data, rejecting data that is inconsistent with the company’s business rules. If not, it is a case of “garbage in, garbage out”. Now certainly, if you have a source of customer data that is a well implemented CDI hub, rather than several sources (an ERP system, a CRM system etc) then essentially the CDI hub has carried out the validation and resolution stage already i.e. it is acting as a single system of record for customer data. However the warehouse cannot relax, since it also has to deal with all the other kinds of transaction and master data as well. Indeed, I would argue that a hub-based approach carries with it some dangers. If you implement a CDI hub, then do the same for product using a PIM solution, then you will realise that you need another hub for employee, asset, etc. CDI hub technology typically does not handle other types of master data as it is hard coded around the (important) class of master data called customer.

The article acknowledges that CDI is a subset of MDM, but does not draw attention to the danger of a piecemeal hub implementation one datatype at a time. What is needed is a master data repository that can act as a system of record for all types of master data, itself feeding both data warehouses and other systems (possibly via SOA as the article mentioned, but that is essentially optional). Without this realisation we are in danger of creating yet another set of master data sources without really getting to the heart of the issue. You can have multiple hubs, but somewhere you need a single repository which at least knows where every version of master data is in the enterprise, whether in hubs, ERP or elsewhere; better still if that MDM repository can act as an active provider of master data elsewhere, since it will have the enterprise-wide business rules needed to ensure data quality, which systems closer to operational processes may not have. Without a fully integrated approach to master data we are in danger of just adding unnecessary duplicate sources of master data (since these data are, after all, not going away in the ERP systems). Somewhere a true “master of master data” needs to exist, and that needs to be owned by business people with the authority to resolve inter-department disputes over master data (and not just customer data). Otherwise we are just adding another layer to the spaghetti.

MDM and risk

May 31, 2007

It is not often that I even bother to read articles written by vendors, but there were some good points made in an article by a practice manager for Sipierian regarding MDM and regulation. The point being made was how increased regulation, both in the US with its Sarbanes Oxley and Patriot Act, but also elsewhere with things such as Basel 2 in financial services, should be a significant external “push” for MDM to complement internal “pull” by corporations. In order to measure the overall risk levels at a bank you need to know the total aggregate positions taken with counter-parties, and be able to see whether there are any high exposures with particular clients (the case of Enron springs to mind). In order to do this you need to know exactly who you are doing business with, including subsidiaries of that company, and yet how well do companies really know this?

Many MDM projects set out to get a better understanding of the total picture of either customers or suppliers, since their multiple source systems and classifications of these make it very hard to get a single consistent picture. Certainly many years ago Shell realised that it had no idea how much business it did with, say, Ford or Unilever, since quite apart from internal classification overlap, it was not clear exactly what “Ford” or “Unilever” consists of. This was a key reason why it invested heavily in an enterprise data warehouse project. Multinational companies have so many subsidiaries, often with different trading names (for example Shell owns companies like Bharat Petroleum, Unilever is known as “Hindustan Lever” in India) that it is unlikely that individual operating units have carefully checked the Dun & Bradstreet numbers of all these companies and classified them correctly.

This is important enough when dealing with a global account, but can be critical when dealing with financial trades. I know of one MDM initiative that a financial services organisation that started off as a direct result of Enron, when it transpired that in fact the organisation thought it knew how much exposure it had with Enron, but rapidly discovered that it did not when Enron collapsed. I certainly know of one famous financial institution where a former VP admitted to me that the bank had “no clue” how much business it did with a large, complex beast like Deutsche Bank, for all the usual MDM reasons.

The thing I find curious is all these regulations are all pretty much in place now, and although companies have spent a money on compliance, it is clear from these two cases that the problems are far from solved. The next time an Enron-like event happens (and it will) companies will not only be nursing losses from their exposed positions, but may also have regulatory problems if it turns out that they actually did not truly know the extent of their exposure. Given the state of data quality and master data in most large organisations, I wonder whether companies are being complacent or regulators simply sleepy in checking the effectiveness of the systems at companies. Having a report that tells you your exposure level is all very well, but how reliable are the numbers that make that up? My experience of working with data warehouse and MDM applications tells me that they are likely to be a lot less reliable than many people think.

If you find all this talk of banks rather abstract, consider this: the average hospital has 25 systems that record patient information. If you are one of those patients, how confident are you that these will all tie up?