<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Andy on Enterprise Software &#187; data quality</title>
	<atom:link href="http://andyonsoftware.com/category/data-quality/feed/" rel="self" type="application/rss+xml" />
	<link>http://andyonsoftware.com</link>
	<description>Andy Hayler, founder of Kalido and The Information Difference, gives his views on the enterprise software market. Issues covered include data warehousing, master data management, business intelligence and data quality.</description>
	<lastBuildDate>Thu, 08 Jul 2010 10:49:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Governing Data</title>
		<link>http://andyonsoftware.com/2010/06/governing-data/</link>
		<comments>http://andyonsoftware.com/2010/06/governing-data/#comments</comments>
		<pubDate>Sat, 05 Jun 2010 15:48:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Data Governance]]></category>
		<category><![CDATA[Master data management]]></category>
		<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://andyonsoftware.com/?p=434</guid>
		<description><![CDATA[This week I will be delivering the keynote speech at the IDQ Data Governance Conference in San Diego (funny how they never hold technology conferences in Detroit or Duluth).  This promises to be an excellent event, with over 350 registered attendees, and plenty of movers and shakers in this emerging field.  Data governance [...]]]></description>
			<content:encoded><![CDATA[<p>This week I will be delivering the keynote speech at the IDQ Data Governance <a href="http://www.debtechint.com/dg2010/">Conference </a>in San Diego (funny how they never hold technology conferences in Detroit or Duluth).  This promises to be an excellent event, with over 350 registered attendees, and plenty of movers and shakers in this emerging field.  Data governance is the business-led strand that is beginning to bring together the hitherto curiously separate worlds of MDM and data quality, and it will be interesting to see what leading end-user companies are doing in this field.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyonsoftware.com/2010/06/governing-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Something for nothing</title>
		<link>http://andyonsoftware.com/2010/02/something-for-nothing-2/</link>
		<comments>http://andyonsoftware.com/2010/02/something-for-nothing-2/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 09:09:16 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Data Governance]]></category>
		<category><![CDATA[Master data management]]></category>
		<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://andyonsoftware.com/?p=429</guid>
		<description><![CDATA[Get a discount to the upcoming data governance conference in San Diego.]]></description>
			<content:encoded><![CDATA[<p>In early June there is the annual Data Governance Conference:</p>
<p>http://www.debtechint.com/dg2010/</p>
<p>which this year is in the attractive setting of San Diego (the place with perhaps the best climate in the USA). Naturally as a conference delegate you will be influenced solely by the agenda and the speaker quality rather than the prospect of a sunny location, but I just thought I&#8217;d mention it. </p>
<p>There will be some excellent speakers, and also me giving the keynote.  As a reader of this blog I am happy to offer you a discount should you be able to attend.  Just quote the following code when booking:  IDDG100 &#8211; please be aware that this code expires on May 7th.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyonsoftware.com/2010/02/something-for-nothing-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sunlight is the best disinfectant</title>
		<link>http://andyonsoftware.com/2009/12/sunlight-is-the-best-disinfectant/</link>
		<comments>http://andyonsoftware.com/2009/12/sunlight-is-the-best-disinfectant/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 17:29:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Master data management]]></category>
		<category><![CDATA[Metadata]]></category>
		<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://andyonsoftware.com/?p=416</guid>
		<description><![CDATA[I read a very interesting article today by independent data architecture consultant Mike Lapenna about ETL logic.  Data governance initiatives, MDM and data quality projects are all projects which need business rules of one kind or another.  Some of these may be trivial, and as much technical than business e.g. &#8220;this field must [...]]]></description>
			<content:encoded><![CDATA[<p>I read a very interesting <a href="http://www.information-management.com/newsletters/extract_transform_load_etl-10016721-1.html?ET=informationmgmt:e1254:2109134a:&#038;st=email">article </a>today by independent data architecture consultant Mike Lapenna about ETL logic.  Data governance initiatives, MDM and data quality projects are all projects which need business rules of one kind or another.  Some of these may be trivial, and as much technical than business e.g. &#8220;this field must be an integer of most five digits, and always less than the value 65000&#8243;.  Others may be more clearly business-oriented e.g. &#8220;customers of type A have a credit rating of at most USD 2,000&#8243; or &#8220;every product must be part of a unique product class&#8221;.  Certainly MDM technologies provide repositories where such business rules may be stored, as (with a different emphasis) do many data quality repositories.  Some basic information is stored within the database systems catalogs e.g. field lengths and primary key information.  Databases and repositories are generally fairly accessible, for example via a SQL interface, or some form of graphical view.  Data modeling tools also capture some of this metadata.</p>
<p>Yet there is a considerable source of rules that are obscured from view. Some are tied up within business applications, while there is another class that are also opaque: those locked up within extract/transform/load ETL rules, usually in the form of procedural scripts.  If several source files need to be merged, for example to load into a data warehouse, then the logic which defines what transformations occur are important rules in their own right.  Certainly they are subject to change, since source systems sometimes undergo format changes, for example if a commercial package is upgraded.  Yet these rules are usually embedded within procedural code, or at best within the metadata repository of a commercial ETL tool.  Mike&#8217;s article proposes a repository that would keep track of the applications, data elements and interfaces involved, the idea being to get the rules as (readable) data rather than buried away in code.</p>
<p>The article raises an important issue: rules of all kinds concerning data should ideally be held as data and so be accessible, yet ETL rules in particular tend not to be.  It is beyond the scope of the article, but for me there is a question of how the various sources of business rules: ETL repository, MDM repository, data quality repository, database catalogs etc can be linked together so that a complete picture of the business rules can be seen.  Those with long memories will recall old fashioned data dictionaries, which tried to perform this role, but which mostly died out since they were always essentially passive copies of the rules in other systems, and so easily became out of data.  Yet the current trend towards managing master data actively raises questions about just what the scope of data rules should be, and where they should be stored.  Application vendors, MDM vendors, data quality vendors, ETL vendors and database vendors will each have their own perspective, and will inevitable will each seek to control as much of the metadata landscape as they can, since ownership of this level of data will be a powerful position to be in.</p>
<p>From an end user perspective what you really want is for all such rules to be stored as data, and for some mechanism to access the various repositories and formats in a seamless way, so that a complete perspective of enterprise data becomes possible.  This desire may not necessarily be shared by all vendors, for whom control of business metadata is power.  An opportunity for someone?</p>
]]></content:encoded>
			<wfw:commentRss>http://andyonsoftware.com/2009/12/sunlight-is-the-best-disinfectant/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The State of Data</title>
		<link>http://andyonsoftware.com/2009/07/the-state-of-data/</link>
		<comments>http://andyonsoftware.com/2009/07/the-state-of-data/#comments</comments>
		<pubDate>Fri, 17 Jul 2009 14:00:06 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://andyonsoftware.com/?p=386</guid>
		<description><![CDATA[We have now completed our survey of data quality.  Based on 193 responses from IT and business staff from around the world, there were some very interesting findings.  Amongst these was that 81% of respondents felt that data quality was much more than just customer name and address, which is the focus of [...]]]></description>
			<content:encoded><![CDATA[<p>We have now completed our survey of data quality.  Based on 193 responses from IT and business staff from around the world, there were some very interesting findings.  Amongst these was that 81% of respondents felt that data quality was much more than just customer name and address, which is the focus of most of the vendors in the market.  Moreover, customer name and address data ranked only third in the list of data domains which survey respondents found most important.  Both product and financial data was felt to be more important, yet product data is the focus of barely a handful of vendors (Silver Creek, Inquera, Datactics) while of all the dozens of data quality vendors out there, few indeed focus on financial data.  Name and address is of course a common issue and conveniently is well structured and has plenty of well-established algorithms out there to attack it.  Yet surely the vendor community is missing something when customers rate other data types as higher in importance?</p>
<p>Another recurring theme is the lack of attention given to measuring the costs of poor data quality.  Lots of respondents fail to make any effort to measure this at all, and then complain that it is hard to make a business case for data quality.  &#8220;Well duh&#8221;, as Homer Simpson might say.  Estimates given by survey respondents seemed very low when compared to our experience, and also to anecdotes given in the very same survey.  One striking one was this:  “Poor data quality and consistency has led to the orphaning of $32 million in stock just sitting in the warehouse that can’t be sold since it’s lost in the system.”  This company at least has no difficulty in justifiying a data quality initiative.  The survey had plenty of other interesting insights too.</p>
<p>The full survey and analysis, all 33 pages of it,  can be purchased from <a href="http://www.informationdifference.com/product_catalog.html">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://andyonsoftware.com/2009/07/the-state-of-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Doctoring addresses</title>
		<link>http://andyonsoftware.com/2009/06/doctoring-addresses/</link>
		<comments>http://andyonsoftware.com/2009/06/doctoring-addresses/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 22:52:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://andyonsoftware.com/?p=375</guid>
		<description><![CDATA[Informatica buys Address Doctor, sowing uncertainty among those who license its data.]]></description>
			<content:encoded><![CDATA[<p>Most data quality vendors have their roots in name and address checking, even if their software can go beyond this.  What is less well known is that the actual business of getting street level address data (to verify postal codes etc) is a tedious business that varies dramatically by country (the UK post office database covers almost every address in the UK, but Eire has no post code system, for example).  Software vendors do not typically want to be in the business of updating street address databases, and there is a patchwork of local information providers that fill the gaps.  If you have any international aspirations, though, just discovering who does what by country, and licensing the various data sources is in itself a non-trivial task, and so companies exist that do this.  One was a UK company called Global Address, bought some time ago by Harte Hanks (who market Trillium), while the other was Address Doctor.  Many data quality vendors use Address Doctor, including some that might superficially appear to compete.  These include Dataflux, IBM, and even QAS. Some MDM platform vendors also use Address Doctor, who provide at least basic name and address data for 240 countries and territories. </p>
<p>The cat was put firmly among st the pigeons this week when Informatica bought Address Doctor. From their viewpoint this secures a key provider of address data, and follows their prior acquisitions of Similarity Systems and, more recently, Identity Systems. Informatica, via these purchases, has established itself as one of the major data quality vendors.  Given its competitive position, the data quality vendors who use Address Doctor will, at the least, be feeling nervous.  I spoke to an executive from Informatica this week and was told that Informatica intended to honour the existing arrangements, but who knows how long this state will last?  As Woody Allen said, the lion may lay down with the lamb, but the lamb won&#8217;t get much sleep.</p>
<p>The problem for the other vendors is that there is no obvious place to go.  Global Address is already in the hands of Harte Hanks, and while Uniserv in particular has its own name and address data, it is mainly strong in this area in Europe.  Address Doctor was a convenient neutral player and is now in the hands of a major market competitor, and other vendors may have little choice but to look at building up their own networks of address data providers if they are to sleep easy.  Of course it is not clear that they have to worry; for example Pitney Bowes Business Insight (who have what was Group 1 software) use Global Address, and this arrangement has continued without incident despite Harte Hanks Trillium&#8217;s ownership of them.  </p>
<p>It will be interesting to see what measures the current Address Doctor users take, or whether they will just cross their fingers and hope Informatica plays nice. </p>
]]></content:encoded>
			<wfw:commentRss>http://andyonsoftware.com/2009/06/doctoring-addresses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What lurks within</title>
		<link>http://andyonsoftware.com/2009/03/what-lurks-within/</link>
		<comments>http://andyonsoftware.com/2009/03/what-lurks-within/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 15:42:03 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[data quality]]></category>

		<guid isPermaLink="false">http://andyonsoftware.com/?p=364</guid>
		<description><![CDATA[I have recently been spending some time looking at the data quality market, and a few things seem to pop up time and again.  The first thing is, in talking with customers, just how awful the quality of data really is within corporate systems.  One major UK bank found 8,000 customers whose age [...]]]></description>
			<content:encoded><![CDATA[<p>I have recently been spending some time looking at the data quality market, and a few things seem to pop up time and again.  The first thing is, in talking with customers, just how awful the quality of data really is within corporate systems.  One major UK bank found 8,000 customers whose age was over 150 according to their systems. All seemingly academic (if you are taking money out of your account, who cares what your age is?) until some bright spark in marketing decided that selling life insurance to these customers would be a fine idea.  </p>
<p>Story after story confirms some really shocking data errors that lurk beneath most operational systems.  These are the same operational systems that are used to generate data for the end-year accounts which senior executives happily sign off on pain of jail-time these days.  I hope no one shows these sames execs the data inside some of these systems, or they might start to get very nervous indeed.</p>
<p>Yet in a survey we did last year, only about a third of companies in the survey have invested in data quality tools at all! Does anyone else find this in any way scary?  Do you have any entertaining data quality stories you can share?</p>
]]></content:encoded>
			<wfw:commentRss>http://andyonsoftware.com/2009/03/what-lurks-within/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
