Creating a burning data quality platform

There is a blog I read by Forrester today that rang true. The point being made is that data quality is a hard sell unless some crisis happens. This is evidently true, since the data quality market is small. and yet the problems are large. I have encountered several shocking pieces of data quality in my time that were costing millions of dollars. In one case an undetected pricing error in a regional SAP system meant that a well known brand was being sold at zero profit margin. In another case, a data quality error in a co-ordinate system caused an exploration bore to be dug into an existing oil well, which luckily was not in production at the time so “only” cost a few million dollars. In more general terms every dollar spent on data quality should save you four. Yet these examples I mentioned (and there are plenty more) actually showed up not in data quality projects but in data warehouse or master data projects, which in principle were supposed to be taking “pure” data from the master transaction systems. This does not inspire confidence in the state of data in corporate systems which are not “clean”.

I am not sure why this sorry state of affairs exists other than to note that in most companies data quality is regarded as an IT problem, when in actual fact the IT folk are the last people to be in a position to judge data quality. Responsibility lies firmly in the business camp. Moreover, as I have mentioned, justifying a data quality project should not be hard: it has real dollar savings, quite apart from other benefits e.g. reduction in reputational risk. I suspect that some of the problem is that it is embarrassing (“no problems with our company data, no sirree”) and, let’s face it, pretty dull. Would you rather work on some new product launch or be buried away reviewing endless reports checking whether data is what it should be?

For people toiling away in corporate IT the right way to get attention might be to use a modern data quality tool, find a sympathetic business analyst and poke around some corporate systems. These days the tools do a lot of self-discovery, so finding anomalies is not as manually intensive as it used to be. If you turn over a few stones in corporate systems you will be surprised at what will turn up. Chances are that at least one of the issues you encounter will turn out to be expensive, and this may raise the profile of the work, allowing sponsorship to dig around in other areas.

3 thoughts on “Creating a burning data quality platform”

  1. The Data Quality issues of the Amazon product catalog are a good starting point to raise the attention for the topic. Why? Because Data Quality is no longer a topic hidden somewhere in the “backend”. The Data Quality issues in Amazons product catalog “transport” the issue of Data Quality directly to the desk of every Consumer ( example and details in this post: ). So Data Quality issues are “visible” and will become even more so in the future. Is Amazon investing into sorting out the Data Quality issues in their product catalog? For sure, but what is interesting (see my post) is that it looks like they go for “building” a solution themselves versus “buying”.

  2. Andy,

    I think a big part of it is that it’s (a) just not visible, and (b) people don’t know where to start.

    Organizations need to shine line on their data quality problems, and force the business to pay attention. Rather than trying directly to invest in fixing data quality, organizations should first consider an investment in data profiling tools. Using techniques such as counting the number of patterns in the data, these tools make it easy to figure out just how bad the data is in different systems, and generate figures that can be used in anecdotes (“did you know that our customers have seventeen different genders?!”), and for building the business case…

    I think Andy Bitterer of Gartner summarized it well during his keynote at the BI Conference in London earlier this year (quoting from memory): “There isn’t a company on the planet that doesn’t have a data quality problem… you should invest in data profiling: it’s not expensive”…

    Regards, Timo
    BI Questions Blog

Comments are closed.