XML is not enough

I just read a particularly clear explanation of how XML contributes to helping with, but does not really solve, the problem of data integration. This is major issue as companies begin to deploy applications in the form of services, since as you bring elements of an application together via web services you usually also have to worry about how the data used by the application is going to be passed to another. There are just too many versions of XML, and insufficient semantic integration support, to just say “ah, we don’t need to worry about that – we are XML compliant”, yet this is exactly the marketing position of some vendors. As the article points out, a higher degree of semantic integration is needed. Master data management applications seek to provide this by establishing a repository of trusted information which has the necessary level of understanding to map the various definitions of “customer”, “product”, “fixed asset”, “location” etc together.

Whether you deploy such an application in a “co-existence” mode or “operational” mode is less important than going through the process of mapping together the competing definitions of master data strewn throughout any large company. Having a dial tone on my telephone enables me to phone someone in Argentina, but does not mean that we can communicate unless we also speak the same language. In the same way XML is a useful, but insufficient, building block in the path to data reconciliation in the enterprise. Only higher level semantic-based models are going to do that, and they will be hard work to implement given the amount of human interaction between different departments and company subsidiaries needed to resolve the differences that have built up over time.

Psst, wanna buy some software?

At various times in this blog I have discussed the buying process that enterprise software buyers go through, and suggested some tips and things to avoid. I have just come across an entertaining and useful “mini book” by ex software salesman Doug Mitchell, which you can read here.
Some of the anecdotes are funny, but many of the points being made are serious, discussing the things that software buyers should really be asking the salesman, as distinct from what often happens. Points such as the utterly self-defeating “savings” made by buyers in skimping on training in the software they are buying ring very true: I can remember exactly such conversations when I was at Kalido. I hope at some point Doug manages to expand this into full book form, as there is very little out there written about the interaction between software vendors and their buyers.

Clearing a migration path

One of the issues often underestimated by new vendors attacking an entrenched competitor is the sheer cost of platform migration. For example, in the database world, if someone comes out with a new, shiny DBMS that is faster and cheaper than the incumbents, why would customers not just switch? After all the new database is ANSI compliant and so is the old one, right? Of course this view may look good in a glossy article in a magazine or the fevered fantasies of a software sales person, but in reality enterprises have considerable switching costs for installed technology. In the case of databases, SQL is just a small part of the story. There are all the proprietary database extensions (stored procedures, triggers etc), all the data definition language scripts, systems tables with assorted business rules implicitly encoded, and he invested time and experience of the database administrators, a naturally conservative bunch. I know as I was a DBA a long time ago; there is nothing like the prospect of being phoned up in the middle of the night to be told the payroll database is down and asked how many minutes will it take you bring it back up, to give you a sceptical perspective on life. New and exciting technology is all well and good, but if that involves rewriting a large suite of production batch jobs that you have just spent months getting settled, you tend to just push that brochure back across the table of the software sales person. Installed enterprise software is notoriously “sticky”.

Hence when attacking something that is already in production, you have to go further than just say “ours is x times cheaper and faster”. An example of this is with DATAllegro and their assault on the mountainous summit that is the Teradata installed base. They have just, very sensibly, brought out a suite of tools that will actually help convert and existing Teradata account, rather than just hoping someone is going to buy into the speed and cost story. This new suite of utilities will:

– convert BTEQ production jobs with the DATAllegro DASQL batch client
– convert DDL from Teradata to DATAllegro DDL
– connect to the Teradata environment and extract table structures (schema) and data and import them into DATAllegro.

This is right approach to take. What they need to do next is get some public customer stories who have actually been through this conversion, and get them to talk about the benefits, and also realistically about the effort involved. If they can do that then they will be in a credible position to start eating away at the Teradata crown jewels, the seriously high end databases with 100TB or more.

Let the sunlight in

I have recently noticed a pronounced division in the mentality of software vendors to disclosing information. On the credit side of the column are vendors like Kognitio, whose CEO happily discussed the company strategy, their revenues, profitability and customer deployments. At the other end of the spectrum was a data quality vendor who would not even tell me how many employees they had (actually, at the far, far end of the spectrum is Ab Initio, who won’t demo their software to a customer without a non-disclosure agreement). What I am curious about is what these so-shy vendors think they are achieving by hiding information. To either prospects or analysts, if a vendor looks shiftily to one side and says “ah, as a policy we don’t disclose xxx” (substitute: employees, revenues, customers, profitability, whether they have a working product, etc) then do they think the prospect is going to be (a) reassured (b) more nervous than before?

If a vendor is a small start-up then we all know that it is likely to be loss-making, have just a few customers so far and be fairly small. That is what software start-ups are. The reason we are talking to them is that (hopefully) have something interesting to offer that the big brand vendors do not. It is OK if there are only a handful of customers if the product is fairly new, and it is OK to not be profitable if the company strategy is rapid growth. As a customer, it is often much easier to deal with smaller vendors who actually care about you, rather than some vast marketing machine where raising a software bug is as useful as dropping a message in a bottle in the ocean.

However anyone contemplating purchasing software is at some point going to want to get a sense of whether they are customer number #32, or customer #2, and how this element of risk stacks up against what the company has to offer. Incidentally, as I have written previously, it is far from clear that it is always safe to buy from large vendors. I have personally been stung a couple of times as an enterprise buyer when giant vendors decided that their product was not doing well enough, so just dropped it. If small company has just the one product you can be pretty sure they will care about it. Presumably vendors who are coy are thinking “let’s not scare them off now, we’ll demo the software, get it installed and then they will be so impressed that they won’t notice we are small/unprofitable/early stage/etc”. Well, I have news for you, at some point they will care, and if this moment comes after you have invested a lot of time with them then it is a lot more painful than if this came out right at the start, when at least you can redirect your attentions elsewhere.

So in my view, openness is the best policy. If a customer is terrified about the idea of dealing with an early stage company, better to find out at he beginning of the conversation than after you have invested scarce pre-sales time in doing demos and perhaps even a software trial. IBM or Oracle can afford a few wasted sales calls, but for a small start-up every conversation and pre-sales demo is precious. If a company is serious about a purchase and they have concerns about the vendor, they will find about your dark secret during their due diligence (when they will run one of those pesky Dun & Bradstreet reports), and they will not thank you if you have glossed over an important issue due to your “policy” of not talking openly about your company.

MDM In Savannah – Day 2

The conference continued today with a string of customer case studies, plus some panel discussions and a couple of vendor presentations that just about managed to avoid being too blatant in their product plugs. I enjoyed a case study from a transport company called Pitt Ohio Express, who had implemented a customer-oriented MDM hub for the practical reason that they need to know where their trucks have to turn up to deliver things. This seems a more pressing reason to sort out customer name and address than a bit of duplicated direct mail. Also, they had actually measured things properly before and after their project, and had achieved a 2% overall company improvement in operating margin due to the initiative. A proper view of customer spend has enabled targeted customer pricing rather than blanket price lists, as an example of a real benefit seen.

I also enjoyed a lively presentation by Brian Rensing,a data architect at Procter and Gamble. There must be marketing in the blood there, as he was an entertaining speaker, and how many data architects can you say that of? He explained how they had managed to get buy-in to their MDM initiative, working one business unit at a time and relying heavily on iterative prototyping to ensure that business people could see short-term benefits, rather than laying out a grandiose multi-year initiative. Their project covers both customer and product initially, both at the corporate level and (gradually) country level, using KALIDO MDM. They see this MDM initiative as being able to enable them to lead into better data warehousing and analytics n the future, since there will be a sounder data foundation on which to work.

In general I am surprised at the number of companies contemplating (and actually doing) MDM projects using entirely in-house technology. One company even devised its own matching algorithms. Surely this is the kind of thing that off-the-shelf data quality products can do much better? I suppose MDM is still in relative infancy in terms of market size (Rob Karel of Forrester reckoned USD 1 billion in 2006, of which only a third was software, a very different number from IDC estimates, but expecting over 50% compound growth over the coming years). The big systems integrators seem yet to have really caught on to this fast growth, with Baseline Consulting at this conference almost the only SI represented (and they are a specialist boutique). It will be interesting to see at what point PWC, Accenture and Bearing Point start turning up to such conferences.

I should relate a conversation with one vendor at the exhibit last night. “So, what kind of revenues do you guys do?”. “We don’t disclose that”. Fair enough, some companies are shy. “How many customers do you have?”. “We don’t disclose the number of customers we have”. “Er, OK, do you have any customers?”. “Oh yes”. Uh huh. “Who are your investors? “That is private.” “How many employees do you have?”. “We can’t share that information”. So we have here a vendor, at a trade show, unwilling to talk about how big it is, who has invested in it, how many customers or even employees they have. Short of putting a puzzle on its web site in order to find the contact address, it is hard to imagine how they could induce nervousness in a prospect more. Surreal. I guess they are going for the “dark and mysterious” marketing approach pioneered by Ab Initio.

Although many case studies were about customer, over half the respondents in a recent TDWI survey said that their MDM initiative had enterprise-wide scope, and there were certainly examples here of case studies around product information, as well as financial information. I still had the sense that a lot of companies were treading gingerly into the MDM world, but there were enough case studies of completed projects to suggest that the growth in the market which Forrester (and others) predict is plausible based on the level of interest shown here.

Perhaps the most entertaining moment of the conference was watching Todd Goldman, VP of marketing for Exeros, doing a (quite impressive) conjuring trick at the beginning of his presentation. It turns out that he is an amateur magician, a skill that must come in very useful in his career in software marketer. This was not the last time I have seen clever illusions in software marketing, nor will it be the last.

MDM In Savannah – Day 1

For the first time TDWI has arranged an MDM conference, running this week in Savannah, and they kindly invited me to speak at the event. It is quite well attended, and is unusual in that customer attendees had to apply to the event in order to minimise “tyre kickers” (but qualified attendees had some travel expenses reimbursed). There are around 100 project managers and the like involved with MDM projects, plus the usual vendors and assorted hangers-on (like me).

The highlight of day 1 was a presentation by Barry Briggs, CTO of Microsoft, about Microsoft’s internal MDM project. Since they did not use their own MDM technology for the project, it came across quite credibly. Microsoft have customer records on 80 million enterprises, and a billon consumer records, but had considerable difficulty in getting a consolidated view of a given enterprise due to the multiple systems used to input customer records. In 2005 they found they had 37 systems that claimed to be the system of record for customer (this is pretty average for a large company, by the way). Starting with Dun & Bradstreet data they mapped the various competing customer records and consolidated these into a repository called MIO (which uses Initiate’s CDI hub technology). Apart from its scale (a project team of 40) there were some interesting aspects to the project.

First was that they did actually measure ROI, which was over 500% of the project cost. The savings were mainly due to reduced time spent by sales staff in managing customer information and related information such as arguing over sales commissions; consolidated views of customers also saved time, and in some cases gave new sales opportunities. New sales reps cannot enter new customer account information without the data being checked in the repository first e.g. a “new” customer might turn out to be actually existing based on a match of its address.

One key point discussed was the level at which matching should be imposed. The technology used assigns a probability of a match between two customer records. The project found that records with a probability of over 85% were almost certainly matches, and let the system assign this automatcially. Below 65% they are rarely matches and are assumed to be genuinely new, but those records in between still require manual intervention, since the consequences of a “false positive” i.e. matching up two records incorrectly are worse than those of missing a match. This seemed to me an important consideration for all such projects using matching algorithms. The project encountered a lot of issues not initially expected e.g. even a list of country codes became controversial since, for example, Taiwan is either a province of China or an independent state dependent on your viewpoint, and the wrong “view” could have considerable political and commercial consequences if displayed to customers.

Another point often missed was how the system itself is very much an operational system. Since this feeds, for example, the CRM system, the MDM application needs the same level of robustness. Indeed as more and more systems are hooked up to it then it could become a single point of failure. This is a point rarely mentioned by vendors, and indeed seem to me to be an important architectural consideration. The more all encompassing the MDM repository, the more scary its operational requirements if it is providing real-time links to OLTP systems.

Another case study was from Royal Bank of Canada, which has 10 million customer records. In their case it was important to have a single view of customer to allow cross-selling e.g. someone with a bank account may also want a credit card or insurance policy. Moreover Canada is about to institute a “do not call” system for cosnumers to avoid pestering marketing calls, and a failure to correctly implement this across the enterprise could result in fines. In this case the MDM sytems was an in house built repostory (on DB2) but using the QualityStage data quality technology to help with matching up and sorting out duplicate customer records. A later audit found just 135,000 possible duplicate records in a database of 10 million, which is in fact excellent. The speaker pointed out that at a certain point it becomes uneconomic to chase down the last few dodgy records. There is a team of 60 people, half business, half IT, dedicated to data quality, which interestingly reports into marketing, not into IT.

Other than that there were a number of panels and presentations, and a tradeshow with the usual suspects which is about to start as I write this. SAP and Exeros are the biggest spenders at this particular show, but Siperian, Kalido and IBM are amongst the sponsors also. So far logistics have been very good, with admirable time-keeping from the organisers. The 70F, sunny weather in February has helped the spirits of the attendees, as has the free ice cream.