Andy on Enterprise Software

Oracle continues its shopping spree

April 12, 2011

A further element of consolidation in the data management occurred when Oracle purchased Datanomic, a data quality company based in Cambridge (for a change, the original one in England rather than the one near Boston). Datanomic has been an interesting story, set up in 2001 and bringing to the market a well-rounded data quality product. This is a crowded market, and in the dreadful conditions for enterprise software that occurred after the market crash in 2001 the company initially struggled. There were, after all, an awful lot of data quality products out there that people had already heard of. Then Datanomic did a very smart thing and re-positioned itself to focus on a business rather than a technical issue: compliance, especially in financial services.

This turned out to be an inspired change of marketing strategy, and the company went from layoffs to hiring again, growing rapidly over the last three years, far in excess of the 9% annual rise in the general data quality market that has been seen recently. Datanomic has had positive customer references in our regular annual surveys, and it seems to me a well-architected solution. From Oracle’s point of view, this complements their purchase of Silver Creek, which was a specialist product data quality tool. These two acquisitions suggest that Oracle is changing its view of data quality – previously they relied on partner arrangements with companies such as Trillium for their data quality solution. Now it would appear that they see data quality as a more integral issue. The price of the deal was not disclosed, but given Datanomic’s rapid recent growth, it will have doubtless been at a healthy premium.

Governing Data

June 5, 2010

This week I will be delivering the keynote speech at the IDQ Data Governance Conference in San Diego (funny how they never hold technology conferences in Detroit or Duluth). This promises to be an excellent event, with over 350 registered attendees, and plenty of movers and shakers in this emerging field. Data governance is the business-led strand that is beginning to bring together the hitherto curiously separate worlds of MDM and data quality, and it will be interesting to see what leading end-user companies are doing in this field.

Something for nothing

February 25, 2010

In early June there is the annual Data Governance Conference:

http://www.debtechint.com/dg2010/

which this year is in the attractive setting of San Diego (the place with perhaps the best climate in the USA). Naturally as a conference delegate you will be influenced solely by the agenda and the speaker quality rather than the prospect of a sunny location, but I just thought I’d mention it.

There will be some excellent speakers, and also me giving the keynote. As a reader of this blog I am happy to offer you a discount should you be able to attend. Just quote the following code when booking: IDDG100 – please be aware that this code expires on May 7th.

Sunlight is the best disinfectant

December 15, 2009

I read a very interesting article today by independent data architecture consultant Mike Lapenna about ETL logic. Data governance initiatives, MDM and data quality projects are all projects which need business rules of one kind or another. Some of these may be trivial, and as much technical than business e.g. “this field must be an integer of most five digits, and always less than the value 65000″. Others may be more clearly business-oriented e.g. “customers of type A have a credit rating of at most USD 2,000″ or “every product must be part of a unique product class”. Certainly MDM technologies provide repositories where such business rules may be stored, as (with a different emphasis) do many data quality repositories. Some basic information is stored within the database systems catalogs e.g. field lengths and primary key information. Databases and repositories are generally fairly accessible, for example via a SQL interface, or some form of graphical view. Data modeling tools also capture some of this metadata.

Yet there is a considerable source of rules that are obscured from view. Some are tied up within business applications, while there is another class that are also opaque: those locked up within extract/transform/load ETL rules, usually in the form of procedural scripts. If several source files need to be merged, for example to load into a data warehouse, then the logic which defines what transformations occur are important rules in their own right. Certainly they are subject to change, since source systems sometimes undergo format changes, for example if a commercial package is upgraded. Yet these rules are usually embedded within procedural code, or at best within the metadata repository of a commercial ETL tool. Mike’s article proposes a repository that would keep track of the applications, data elements and interfaces involved, the idea being to get the rules as (readable) data rather than buried away in code.

The article raises an important issue: rules of all kinds concerning data should ideally be held as data and so be accessible, yet ETL rules in particular tend not to be. It is beyond the scope of the article, but for me there is a question of how the various sources of business rules: ETL repository, MDM repository, data quality repository, database catalogs etc can be linked together so that a complete picture of the business rules can be seen. Those with long memories will recall old fashioned data dictionaries, which tried to perform this role, but which mostly died out since they were always essentially passive copies of the rules in other systems, and so easily became out of data. Yet the current trend towards managing master data actively raises questions about just what the scope of data rules should be, and where they should be stored. Application vendors, MDM vendors, data quality vendors, ETL vendors and database vendors will each have their own perspective, and will inevitable will each seek to control as much of the metadata landscape as they can, since ownership of this level of data will be a powerful position to be in.

From an end user perspective what you really want is for all such rules to be stored as data, and for some mechanism to access the various repositories and formats in a seamless way, so that a complete perspective of enterprise data becomes possible. This desire may not necessarily be shared by all vendors, for whom control of business metadata is power. An opportunity for someone?

The State of Data

July 17, 2009

We have now completed our survey of data quality. Based on 193 responses from IT and business staff from around the world, there were some very interesting findings. Amongst these was that 81% of respondents felt that data quality was much more than just customer name and address, which is the focus of most of the vendors in the market. Moreover, customer name and address data ranked only third in the list of data domains which survey respondents found most important. Both product and financial data was felt to be more important, yet product data is the focus of barely a handful of vendors (Silver Creek, Inquera, Datactics) while of all the dozens of data quality vendors out there, few indeed focus on financial data. Name and address is of course a common issue and conveniently is well structured and has plenty of well-established algorithms out there to attack it. Yet surely the vendor community is missing something when customers rate other data types as higher in importance?

Another recurring theme is the lack of attention given to measuring the costs of poor data quality. Lots of respondents fail to make any effort to measure this at all, and then complain that it is hard to make a business case for data quality. “Well duh”, as Homer Simpson might say. Estimates given by survey respondents seemed very low when compared to our experience, and also to anecdotes given in the very same survey. One striking one was this: “Poor data quality and consistency has led to the orphaning of $32 million in stock just sitting in the warehouse that can’t be sold since it’s lost in the system.” This company at least has no difficulty in justifiying a data quality initiative. The survey had plenty of other interesting insights too.

The full survey and analysis, all 33 pages of it, can be purchased from here.

Doctoring addresses

June 5, 2009

Most data quality vendors have their roots in name and address checking, even if their software can go beyond this. What is less well known is that the actual business of getting street level address data (to verify postal codes etc) is a tedious business that varies dramatically by country (the UK post office database covers almost every address in the UK, but Eire has no post code system, for example). Software vendors do not typically want to be in the business of updating street address databases, and there is a patchwork of local information providers that fill the gaps. If you have any international aspirations, though, just discovering who does what by country, and licensing the various data sources is in itself a non-trivial task, and so companies exist that do this. One was a UK company called Global Address, bought some time ago by Harte Hanks (who market Trillium), while the other was Address Doctor. Many data quality vendors use Address Doctor, including some that might superficially appear to compete. These include Dataflux, IBM, and even QAS. Some MDM platform vendors also use Address Doctor, who provide at least basic name and address data for 240 countries and territories.

The cat was put firmly among st the pigeons this week when Informatica bought Address Doctor. From their viewpoint this secures a key provider of address data, and follows their prior acquisitions of Similarity Systems and, more recently, Identity Systems. Informatica, via these purchases, has established itself as one of the major data quality vendors. Given its competitive position, the data quality vendors who use Address Doctor will, at the least, be feeling nervous. I spoke to an executive from Informatica this week and was told that Informatica intended to honour the existing arrangements, but who knows how long this state will last? As Woody Allen said, the lion may lay down with the lamb, but the lamb won’t get much sleep.

The problem for the other vendors is that there is no obvious place to go. Global Address is already in the hands of Harte Hanks, and while Uniserv in particular has its own name and address data, it is mainly strong in this area in Europe. Address Doctor was a convenient neutral player and is now in the hands of a major market competitor, and other vendors may have little choice but to look at building up their own networks of address data providers if they are to sleep easy. Of course it is not clear that they have to worry; for example Pitney Bowes Business Insight (who have what was Group 1 software) use Global Address, and this arrangement has continued without incident despite Harte Hanks Trillium’s ownership of them.

It will be interesting to see what measures the current Address Doctor users take, or whether they will just cross their fingers and hope Informatica plays nice.

What lurks within

March 11, 2009

I have recently been spending some time looking at the data quality market, and a few things seem to pop up time and again. The first thing is, in talking with customers, just how awful the quality of data really is within corporate systems. One major UK bank found 8,000 customers whose age was over 150 according to their systems. All seemingly academic (if you are taking money out of your account, who cares what your age is?) until some bright spark in marketing decided that selling life insurance to these customers would be a fine idea.

Story after story confirms some really shocking data errors that lurk beneath most operational systems. These are the same operational systems that are used to generate data for the end-year accounts which senior executives happily sign off on pain of jail-time these days. I hope no one shows these sames execs the data inside some of these systems, or they might start to get very nervous indeed.

Yet in a survey we did last year, only about a third of companies in the survey have invested in data quality tools at all! Does anyone else find this in any way scary? Do you have any entertaining data quality stories you can share?