As Rick Sherman quite rightly points out in his commonsense article in DM Review, data quality is not something that you can realistically fix before you build a data warehouse. Data quality in operational systems is often scarily low, but often the only way that this will be highlighted is when it is brought together at a data warehouse. People often assume that the data inside their ERP systems is somehow sacrosanct and immune to data quality issues, and it often comes as a big disappointment when they discover that this is not so.
In one example it turned out that a product was mis-priced in the SAP system in a region, resulting in the product being sold at cost price. This anomaly went undetected for over a year before a data warehouse project brought this to the surface (by comparing gross margin by product by country). Initially everyone assumed it was a bug in the warehouse software, but it was not. Indeed this insight alone pretty much paid for the project.
If a data warehouse can be implemented rapidly and in an iterative fashion then it can quickly highlight business issues such as this one, which may be as a result of data quality or could be as a result of new insight that was previously unavailable: the “wood for the trees” problem. Eventually data quality needs to come out of the closet and be treated as a serious business issue, dealt with by a corporate business function that have the political clout to fix the problems at source. Some progressive companies have set up such organisations, which may report into finance or another corporate function, but never into the CIO.
However, to show just how long a journey data quality can be, I can recall working with such a function in Esso UK in the mid 1980s. The issue is only now dawning on many companies, and still has to surface in most.