Never mind the quality, feel the width

Frank Buytendijk (ex Gartner analyst, now with Oracle) makes an importantpoint about data quality on his blog: it is inherently dull. This in itself causes problems both to people within organisations who care about data quality (there must be a few of you out there) and for data quality vendors, who struggle to sell their products at a decent price point in sufficient numbers. I have written about this before, in which I pointed out just a couple of real life cases of poor data quality that I have personally encountered, each of which cost many millions of dollars.

The reason that data quality is generally excellent in the area of salary and expense processing is that people care deeply about what they get paid, and you can be pretty sure than any clerical errors get spotted and complained about very quickly. However in most cases data quality occurs due to people being asked to enter or maintain data for which they see no personal or even obvious company benefit. Data that is useful for “some other department” is never going to receive the same care and attention that your own personal expense claims get.

As Frank says, in order to move data quality higher up the enterprise priority list, it needs to widen its perspective: move beyond talking about customer names and addresses. Yes, this is important if you are doing mailshots, and certainly poor customer name and address management can have more serious consequences, but most executives have got better things to do than worry about whether their mailshots are being duplicated.

Despite numerous acquisition over the years (First Logic, Similarity, Vality, …) there are still plenty of small data quality vendors out there, some with very interesting technology. Yet aside from Trillium, few have managed to get even into double figures of millions of revenue. This is not due to an absence of a real problem to address.

Some data quality vendors rightly see master data management as a way of repositioning their offerings in a more fashionable area, but they need to realise that data quality is just a feature of a complete MDM solution. Hence they need to partner with broader-based MDM repository vendors who themselves often lack proper data quality technology, rather than pretending they themselves are a complete solution. They should also do a better job of highlighting quantified customer dollar benefits achieved from the use of data quality technology. This should not be hard to do since data quality projects usually have excellent payback. Yet time after time the example used in data quality collateral are the tired name and address cleanup, followed by an esoteric discussion about whether probabilistic or deterministic matching is better (paying customers don’t care – they are interested in what benefits they see). Far too few data quality case studies mention hard-dollar benefits to the customer.

Data quality should have much going for it: it is a very real problem, the condition of data quality in most large organisation is horrible (and far worse than generally realised), and the costs of this are significant and cause genuine and in some cases very serious operational problems. Yet the industry as a whole has done a poor job of explaining itself to the people with the cheque books in enterprises.