Did data quality just get more interesting?

The data quality market has been a strange beast.  The problem is huge, with just about every company in the world having issues regarding data quality.  Duplicate customer information is the area that is most familiar, but the problem can be a lot more serious than a few bits of duplicated junk mail.  More than a decade ago a Shell exploration drill neatly drilled into part of an existing drilling operation because the co-ordinates of the existing pipe were incorrect in a database.  Fortunately there was no oil flowing through the pipe at the time or a major spillage would have occurred, but even so the rig was off-line for some time, at a cost of about half a million dollars a day even then. I can certainly testify that a large chunk of every data warehouse project is spent dealing with data quality issues, even when the data comes from supposedly authoritative sources like ERP systems. 

Yet despite a real problem and real dollars attached to this problem, the data quality software market is something of a minnow.  Forrester estimates it at USD 1 billion, and that includes consulting. The software part of the market is maybe USD 200 million.  Over recent years pure play vendors have been bought up and incorporated into broader offerings e.g. Vality by Ascential (now IBM), FirstLogic by Business Objects, Similarity Systems by Informatica, Trillium by Harte Hanks.  The price of FirstLogic, USD 69 million for a business with USD 50 million in revenue, was hardly something to make investors salivate (to be fair, Similarity’s price tag of USD 49 million was at a much better multiple, and there were some peculiarities around Firstlogic).  Why should this be?  Part of the problem is that data quality issues remain a resolutely human problem, typically meaning that when validation rules are defined in a data quality tool, only part of the problem surfaces.  Business processes around who actually owns the data and who can fix problems are as big a problem as actually finding errors in the data itself.  It is encouraging to see a couple of start-ups take a new approach to this old chestnut.  Though differing somewhat in the technologies used, both take the approach of trying to discover relationships between existing data by analysing existing datasets rather than relying on building up a top-down rules profile. Exeros just secured a USD 12 million series B funding round, and has high quality venture backing from Globespan Capital Partners and Bay Partners. A rival company with a similar approach is Zoomix.  Based on technology developed in the Israeli military, Zoomix uses data mining techniques to seek out the relationships amongst existing data, presenting its first attempt to a business analyst and then learning from the responses so as to improve the algorithms in future iterations. They also have an interesting new product which can apply these rules in real-time to an existing transaction system, called up as an SOA service.  This effectively acts like a firewall, but for data quality. Zoomix has customers in Israel and Europe, and has set up a European HQ in London.  These newcomers present fresh competition to the largest vendor (Trillium) as well as to other start-ups such as Datanomic (based in Cambridge – the original one rather than the Massachusets one) and more specialist quality tools such as Silver Creek, who have taken a tightly targeted approach, in their case dealing with complex product data.

Investors in these companies have clearly seen that, while data quality may be more of a component in a broader solution (BI, MDM, whatever) there is enough in the way of a real problem here to allow room for innovative approaches.  The exits for these companies may well be acquisition by larger companies, just as with Similarity Systems, but it is good to see a fresh approach being taken to this very real, if not exactly sexy, business problem.