For the first time TDWI has arranged an MDM conference, running this week in Savannah, and they kindly invited me to speak at the event. It is quite well attended, and is unusual in that customer attendees had to apply to the event in order to minimise “tyre kickers” (but qualified attendees had some travel expenses reimbursed). There are around 100 project managers and the like involved with MDM projects, plus the usual vendors and assorted hangers-on (like me).
The highlight of day 1 was a presentation by Barry Briggs, CTO of Microsoft, about Microsoft’s internal MDM project. Since they did not use their own MDM technology for the project, it came across quite credibly. Microsoft have customer records on 80 million enterprises, and a billon consumer records, but had considerable difficulty in getting a consolidated view of a given enterprise due to the multiple systems used to input customer records. In 2005 they found they had 37 systems that claimed to be the system of record for customer (this is pretty average for a large company, by the way). Starting with Dun & Bradstreet data they mapped the various competing customer records and consolidated these into a repository called MIO (which uses Initiate’s CDI hub technology). Apart from its scale (a project team of 40) there were some interesting aspects to the project.
First was that they did actually measure ROI, which was over 500% of the project cost. The savings were mainly due to reduced time spent by sales staff in managing customer information and related information such as arguing over sales commissions; consolidated views of customers also saved time, and in some cases gave new sales opportunities. New sales reps cannot enter new customer account information without the data being checked in the repository first e.g. a “new” customer might turn out to be actually existing based on a match of its address.
One key point discussed was the level at which matching should be imposed. The technology used assigns a probability of a match between two customer records. The project found that records with a probability of over 85% were almost certainly matches, and let the system assign this automatcially. Below 65% they are rarely matches and are assumed to be genuinely new, but those records in between still require manual intervention, since the consequences of a “false positive” i.e. matching up two records incorrectly are worse than those of missing a match. This seemed to me an important consideration for all such projects using matching algorithms. The project encountered a lot of issues not initially expected e.g. even a list of country codes became controversial since, for example, Taiwan is either a province of China or an independent state dependent on your viewpoint, and the wrong “view” could have considerable political and commercial consequences if displayed to customers.
Another point often missed was how the system itself is very much an operational system. Since this feeds, for example, the CRM system, the MDM application needs the same level of robustness. Indeed as more and more systems are hooked up to it then it could become a single point of failure. This is a point rarely mentioned by vendors, and indeed seem to me to be an important architectural consideration. The more all encompassing the MDM repository, the more scary its operational requirements if it is providing real-time links to OLTP systems.
Another case study was from Royal Bank of Canada, which has 10 million customer records. In their case it was important to have a single view of customer to allow cross-selling e.g. someone with a bank account may also want a credit card or insurance policy. Moreover Canada is about to institute a “do not call” system for cosnumers to avoid pestering marketing calls, and a failure to correctly implement this across the enterprise could result in fines. In this case the MDM sytems was an in house built repostory (on DB2) but using the QualityStage data quality technology to help with matching up and sorting out duplicate customer records. A later audit found just 135,000 possible duplicate records in a database of 10 million, which is in fact excellent. The speaker pointed out that at a certain point it becomes uneconomic to chase down the last few dodgy records. There is a team of 60 people, half business, half IT, dedicated to data quality, which interestingly reports into marketing, not into IT.
Other than that there were a number of panels and presentations, and a tradeshow with the usual suspects which is about to start as I write this. SAP and Exeros are the biggest spenders at this particular show, but Siperian, Kalido and IBM are amongst the sponsors also. So far logistics have been very good, with admirable time-keeping from the organisers. The 70F, sunny weather in February has helped the spirits of the attendees, as has the free ice cream.