Andy on Enterprise Software

The Other Shoe Drops

January 29, 2010

I have been speculating for years now about when Informatica would finally enter the MDM marketplace. It is a leader in the integration space, and has good data quality offerings via its acquisitions of Similarity Systems, and more recently Identity Systems and Address Doctor. I am not sure why Informatica held off so long from gaining exposure to the fast growing MDM market, but its purchase of an MDM platform hub has seemed almost inevitable.

I most recently speculated about the eventual target being Siperian. Today the two finally tied the knot, with Infomatica buying Siperian for around $130 million in cash. I estimate that this is a revenue multiple of perhaps four times. This is a very logical purchase for both companies. Siperian has solid technology based on a flexible business model, and will be very complementary to Informatica’s data quality and integration offerings.

This acquisition further confirms the attractiveness of the MDM market, and is good news for the other MDM platform vendors. Market rumours are swirling as we speak about the purchase of Initiate (perhaps by IBM) and if this happens then it could be a gold rush to pick off the better quality independent companies. Despite many purchases already in this space, there are still plenty of potential further acquirers out there.

A New Year Beckons

January 5, 2010

I would like to wish you all a happy New Year. Some people have inquired about why my here postings have become less regular than they were in the past. Well, it is not that I have run out of opinions (you wish), but I now have some other outlets for the material that used to appear entirely in this blog. Some of the longer-style posts now appear in the form of a monthly column in CIO magazine:

http://www.cio.co.uk/author/andy-hayler/ (which has a link to the articles)

The more news-related items, which typically talk about an event in the software industry such as an acquisition or a significant new software release, appear as articles on the web sites IT Analysis and IT Director. For example:

http://www.it-director.com/about/author/14100/andy_hayler.php (with link to articles)
http://www.it-analysis.com/about/author/14100/andy_hayler.php (this has a link also).

I will use this blog to cover miscellaneous items that don’t fit logically into the above forums.

I am looking forward to 2010, which I hope will see at least a partial recovery for the economy and hence better times for the enterprise software industry. Have a great year!

Sunlight is the best disinfectant

December 15, 2009

I read a very interesting article today by independent data architecture consultant Mike Lapenna about ETL logic. Data governance initiatives, MDM and data quality projects are all projects which need business rules of one kind or another. Some of these may be trivial, and as much technical than business e.g. “this field must be an integer of most five digits, and always less than the value 65000″. Others may be more clearly business-oriented e.g. “customers of type A have a credit rating of at most USD 2,000″ or “every product must be part of a unique product class”. Certainly MDM technologies provide repositories where such business rules may be stored, as (with a different emphasis) do many data quality repositories. Some basic information is stored within the database systems catalogs e.g. field lengths and primary key information. Databases and repositories are generally fairly accessible, for example via a SQL interface, or some form of graphical view. Data modeling tools also capture some of this metadata.

Yet there is a considerable source of rules that are obscured from view. Some are tied up within business applications, while there is another class that are also opaque: those locked up within extract/transform/load ETL rules, usually in the form of procedural scripts. If several source files need to be merged, for example to load into a data warehouse, then the logic which defines what transformations occur are important rules in their own right. Certainly they are subject to change, since source systems sometimes undergo format changes, for example if a commercial package is upgraded. Yet these rules are usually embedded within procedural code, or at best within the metadata repository of a commercial ETL tool. Mike’s article proposes a repository that would keep track of the applications, data elements and interfaces involved, the idea being to get the rules as (readable) data rather than buried away in code.

The article raises an important issue: rules of all kinds concerning data should ideally be held as data and so be accessible, yet ETL rules in particular tend not to be. It is beyond the scope of the article, but for me there is a question of how the various sources of business rules: ETL repository, MDM repository, data quality repository, database catalogs etc can be linked together so that a complete picture of the business rules can be seen. Those with long memories will recall old fashioned data dictionaries, which tried to perform this role, but which mostly died out since they were always essentially passive copies of the rules in other systems, and so easily became out of data. Yet the current trend towards managing master data actively raises questions about just what the scope of data rules should be, and where they should be stored. Application vendors, MDM vendors, data quality vendors, ETL vendors and database vendors will each have their own perspective, and will inevitable will each seek to control as much of the metadata landscape as they can, since ownership of this level of data will be a powerful position to be in.

From an end user perspective what you really want is for all such rules to be stored as data, and for some mechanism to access the various repositories and formats in a seamless way, so that a complete perspective of enterprise data becomes possible. This desire may not necessarily be shared by all vendors, for whom control of business metadata is power. An opportunity for someone?

MDM and Spaghetti

September 14, 2009

When looking at the business case for MDM it is normal to look at the kind of business initiatives that can be enabled by better master data. For example with higher quality, consistent customer data it is possible to run more efficient marketing campaigns, or by having a complete picture of a customer it is possible to cross-sell products effectively or better manage an account. However such things tend to rely on having MDM as a piece of infrastructure, so it is hard to claim all the benefits directly for an MDM project. Perhaps it is time to take a look at some of the more murky and less sexy areas that can benefit from MDM, specifically by lowering the cost of maintaining application interfaces.

Large companies have hundreds of applications, even after they have finished implementing an ERP system (and then re-implementing it to reduce the number of ERP instances). One company I work with owns up to 600 applications, another to 3,500. In many cases data needs to be shared across applications, and of course the very fact of having so many systems can cause master data issues to occur, since each application frequently generates and maintains at least some master data that it needs rather than being fed such data by a consistent enterprise-wide master data repository.

One key difference between MDM hubs and a data warehouse is that a warehouse needs to have clean, pure data; this is achieved by an extensive process of data cleansing and validation that is conducted outside the warehouse prior to data being loaded, perhaps through a mix of data quality tools and ETL processing. Indeed one major issue is that in order to come up with high quality data for the warehouse, business rules end up being embedded in sometimes complex ETL scripts, which are opaque and hard to get business people to engage with. A good master data hub should be able to take on much of this burden of strong and managing business rules, and may be a more productive place to carry this out. For example it may be more effective to use probabilistic techniques to help determine matches angst potential data sources (say, multiple product masters) rather than needing to hard-code business rules, as usually happens with ETL scripts. If this is the case then you may be able to get away with a much smaller set of business rules in an MDM hub than were typically necessary in ETL scripts. In turn this reduction of complexity may be able to cause a lot of the maintenance effort needed in maintaining such scripts to be go away.

I have not seen any quantitative analysis out there of the relative productivity of MDM hubs v ETL processing for storing business rules, or the potential effect that this could have on the support effort needed to maintain interfaces. However it was always the case that a high proportion of overall support effort in an enterprise was associated with interfaces, so even a small effect here could have a significant saving in terms of IT costs. I think there is an opportunity here for someone to do some serious research into this area, getting hard data rather than making hand-waving benefits statements. If followed through, it would not surprise me to see this as an area where properly implemented MDM could have a significant effect on IT support costs. Given that so many people claim that making a business case for MDM is one of the biggest obstacles, this would seem to me a fruitful area of further research.

Something for Nothing

August 31, 2009

I have now completed the second of my on-line courses on master data management for eLearning Curve. This one goes into considerable detail on how to evaluate an MDM vendor, based around an in-depth MDM functionality model which I have developed (and which has been through a significant review process by some serious MSM experts). The course also looks at the MDM market and talks about the current vendor Landscape in some depth, and finally goes through a suggested process for software procurement, including some tips and hints I have learnt by being on both sides of the negotiating fence.

The course can be accessed here:

http://ecm.elearningcurve.com/The_MDM_Market_How_to_Select_a_Vendor_p/mdm-03-a.htm

Its price is what seems to me almost absurdly cheap (eLearning Curve is new and they are trying to promote things), but as a reader of this blog you can take advantage of a special offer as well. When buying the course just quote the following voucher code:

AHDisc11R

and you will get a further discount of 20% off the already amusingly low list price. Seriously, this is a real bargain. Over five hours of chunky, in-depth material, to absorb at your leisure.

As the old Derek Bok saying goes, if you think education is expensive, try ignorance.

No Data Utopia

August 11, 2009

The data warehouse appliance market has become very crowded in the last couple of years, in the wake of the success of Netezza, which has drawn in plenty of venture money to new entrants. The awkwardly named Dataupia had been struggling for some time, with large-scale redundancies early in 2009, but now appears to have pretty much given up the ghost, with its assets being put up for sale by the investors.

If nothing else, this demonstrates that you need to have a clearly differentiated position in such a crowded market, and clearly in this case the sales and marketing execution could not match the promise of the technology. However it would be a mistake to thing that all is doom and gloom for appliance vendors, as the continuing recent commercial success of Vertica demonstrates.

To me, something that vendors should focus on is how to simplify migration off an existing relational platform. If you have an under-performing or costly data warehouse, then an appliance (which implies “plug and play”) sound appealing. However although appliance vendors support standard SQL, it is another thing to try and migrate a real-life database application, which may have masses of proprietary application logic locked up in stored procedures, triggers and the like. This would seem to me the thing that is likely to hold back buyers, but many vendors seem to focus entirely on their price/performance characteristics in their messaging. It actually does not matter if a new appliance has 10 times better price performance (let’s say, saving you half a million dollars a year) if it takes several times that to actually migrate the application. Of course there are always green-field applications, but if someone could devise a way of dramatically easing migration effort from an existing relational platform then it seems to me that they would have cracked the code on how to sell to end-users in large numbers. Ironically, this was just the kind of claim that Dataupia made, which suggests that there was a gap between its claims and its ability to convince the market that it was really that easy, despite accumulating a number of named customer testimonials on its web-site.

Even having the founder of Netezza (Foster Hinshaw) did not translate into commercial viability, despite the company attracting plenty of venture capital money. The company has no shortage of marketing collateral; indeed a number of industry experts who have authored glowing white-papers on the Dataupia website may be feeling a little sheepish right now. Sales execution appears to have been a tougher nut to crack. I never saw the technology in action, but history tells us that plenty of good technology can fail in the market (proud owners of Betamax video recorders can testify to that).

If anyone knows more about the inside story here then feel free to contact me privately or post a comment.

New MDM Research

August 7, 2009

My company has now completed its latest assessment of the MDM market, which we represent via our “Landscape” diagram. The research is quite time-consuming, and involves surveying vendors for factual information (then sifting out their more blatantly optimistic assertions; I am a suspicious soul), looking at product demonstrations and asking pesky questions about features of the products, and carrying out a survey of reference customers. The results are amalgamated into a high level view of each vendor in the market in the dimensions of “market strength”, “technology “and “customer base”. Much more detail of the breakdown of the various elements that contribute to these scores are held in our vendor profiles.

You can see the diagram and some accompanying text on the front page of our web site.

Training without travelling

July 31, 2009

I have spent some time recently in building up an MDM course. Normally such things are done at conferences, so unless you happen to be at some distant venue, they pass you by. However this one is different. It is an on-line course, marketed by a new company called eLearning (who have some well-known founders). The company reckons that it is harder and harder for people to justify trips to conferences for education, and this is certainly true at the moment from what I have observed and heard about technology conferences. Hence its model is to sell courses on-line.

The course is “MDM Fundamentals and Best Practice”, and you can see more about it here. It is actually quite a lot of work to put together such a course, but I am pleased with the result, and you can now get over three hours of my views on MDM for a very fair price indeed, all from the comfort of your desk or living room, and at a pace that you can control. Of course you do miss out on the trip to Las Vegas or similar, but you can’t have everything.

The State of Data

July 17, 2009

We have now completed our survey of data quality. Based on 193 responses from IT and business staff from around the world, there were some very interesting findings. Amongst these was that 81% of respondents felt that data quality was much more than just customer name and address, which is the focus of most of the vendors in the market. Moreover, customer name and address data ranked only third in the list of data domains which survey respondents found most important. Both product and financial data was felt to be more important, yet product data is the focus of barely a handful of vendors (Silver Creek, Inquera, Datactics) while of all the dozens of data quality vendors out there, few indeed focus on financial data. Name and address is of course a common issue and conveniently is well structured and has plenty of well-established algorithms out there to attack it. Yet surely the vendor community is missing something when customers rate other data types as higher in importance?

Another recurring theme is the lack of attention given to measuring the costs of poor data quality. Lots of respondents fail to make any effort to measure this at all, and then complain that it is hard to make a business case for data quality. “Well duh”, as Homer Simpson might say. Estimates given by survey respondents seemed very low when compared to our experience, and also to anecdotes given in the very same survey. One striking one was this: “Poor data quality and consistency has led to the orphaning of $32 million in stock just sitting in the warehouse that can’t be sold since it’s lost in the system.” This company at least has no difficulty in justifiying a data quality initiative. The survey had plenty of other interesting insights too.

The full survey and analysis, all 33 pages of it, can be purchased from here.

Data Quality Survey

June 15, 2009

As part of our ongoing research program, we are conducting a major survey into the state of data quality today. If you have a few minutes it would great if you could participate in this (all participants get a free summary of the survey results).

The survey can be found here:

In addition your e-mail address will be entered for a prize draw offering you the chance to win one of ten free annual subscriptions to The Information Difference website (worth USD $ 550).

Thanks in advance.