Ploughing a new data furrow

As there was some interest in the last blog I thought it might be useful for people to know a little more about Exeros. The company was set up by an ex-founder of ACTA, and did a series A funding round in 2004. The company has some innovative technology which essentially reverse-engineers the structure of data by looking at the data values of database tables and files. This is different from some other profiling approaches, which often examine metadata e.g. column headings etc rather than data values. In this way it “discovers” business rules inherent in the data, and as a by-product then also discovers how well the data adheres to those rules. For example in one customer example they have, a customer gave Exeros a sample dataset and the product whirred away and discovered its structure. All well and good, but it also pointed out that in one case there was only a 98% match of the data to the structure, which caused the customer to say: “that’s impossible, that it is a mandatory field”. Well, perhaps, it was, but the data was still in error! In my own experience of MDM projects there are plenty of such moments; customers have an amusingly naive view of how good their data quality really is.

Other companies that purport to do data discovery are Sypherlink, and ahref=””>Zoomix , but Sypherlink in particular seems to use more conventional metadata-based profiling. The functionality that Exeros provides is useful for situations like master data integration projects, or data consolidation projects. It could also be used to help in building staging areas for ETL builds, where multiple sources of data often throw up all sorts of issues that have to be resolved manually. Exeros does not have a repository as such, and generates its analysis as output in either XML form or as feeds into tools like Business Object or ETL tools such as Informatica and IBM/Ascential.

The company started selling commercially in 2006, and already has a dozen or so customers in production. So far this has been mainly in the financial services area, who have plenty of data issues and stiff compliance reporting needs, but there is no reason why the technology should not be applied to any industry as far as I can see. There seems to have been some pretty serious R&D here, with a product team of 40 people, and the company seems to be to have kept to an admirably tight focus so far rather than trying to claim it solves the world’s problems on its own. Over time I would expect to see it having opportunities to partner with MDM vendors, especially those who take a generic MDM approach rather than, say, CDI only vendors. The broader the breadth of data, the more complex data issues emerge.

Marketing the company as “data discovery” rather than “data quality” is a good idea, as the approach is genuinely different, and avoids the company being pigeon-holed alongside more established companies. The drawback is that they essentially trying to carve out a new market, never an easy thing, and will encounter the usual emerging company issues with conservative buyers and analysts who prefer to neatly drop them into an existing slot. However in my view the problem they are tackling is very real, and the approach seems innovative, so they should continue on this path. If they make enough customers happy then the analysts will soon come around to their view.

Discovering Data Quality

For those following the mixed fortunes of the data quality vendors, one of the more interesting recent development has been a company called Exeros. After getting a hefty series B round a year ago, Exeros has just landed a partnership with BI behemoth Business Objects. This is potentially very good news for Exeros. Any BI project involves a significant element of data quality, and so the fit is logical, and Exeros’ cunning “discovery” slant to its marketing will give a fresh-sounding label to the otherwise rather dowdy data quality market. What is curious is that Business Objects already owns not own but two data quality vendors, First Logic and the entertainingly Germanic Fuzzy Informatik (which sounds to me like a Kraftwerk single). The press release was the usual partnership waffle, so it was unclear from this as to exactly how the joint proposition would be brought to market, but it does make you wonder about why Business Objects needs a new tool, unless the existing acquired technology is not doing quite what it was supposed to.

Exeros has been very good in its marketing execution so far, and this partnership is another example of it. As far as I can see there is little reason why other data quality vendors (e.g. Datanomic) could not have latched onto this “discovery” label, which makes an old subject sound new and interesting, as their technology seems to do pretty much the same thing, but they have chosen not to. I am not sure what Exeros’ sales have been like so far, but this partnership is certainly a useful step for them.

Cognos splashes out

Cognos has just bought Applix, whose TM1 product was a pioneer in the in-memory database market. Applix has been highly profitable, aiming at volume sales to the mid-market financial analysis market, with a typical sale price well under USD 100k. However it had revenues of USD 61M in 2006, and is likely to show modest growth on that in 2007 (maybe USD 70M revenues) but with a yummy operating profit margin of 24%. Cognos now has an amusing range of OLAP engines, that within Powerplay, the one that came with Adaytum, and now this one; at this stage it is unclear whether these will all continue or whether some sort of consolidation will occur in the long term.

The interesting thing about this purchase was the price: a hefty USD 339M (USD 306M if you strip out the cash in Applix’s bank account that Cognos will acquire). At five times trailing revenues and four and a half times forward revenues this is a very healthy premium indeed. The backers of Qliktech in particular, which is based on a similar engine but has far stronger growth than Applix, must be rubbing their hands in glee.

Searching for meaning

I have written before about how many industry surveys can be almost meaningless due to the way that they are phrased or the way that the audience is selected or encouraged to participate. Sometimes the survey itself can miss the point, as in a recent one about the percentage of data that is structured or unstructured. An article about this agonises about whether unstructured data is 31% of all enterprise data, or 50 odd percent, rather than the “80% claimed by other research organisations”. It seems to me that this misses the point. It is less relevant about what proportion of data is unstructured (and by the way, does that mean the storage volume, or the number of sources, or something else, since the article blithely skips over this) than about the value and usage of this data. The context here is the use of search technology of BI, with people who sell this technology presumably wanting to make a point that most data is there in emails and spreadsheets, so therefore search technology can mostly replace that pesky BI business. This seems to me a flawed argument. In the context of business information, we typically know what we want e.g. the monthly sales figures and, unlike when we search the web using Google, we also have a fair idea where it is e.g. the company financial systems. The difficulty is not in finding the information but in making it meaningful, which is what the vast majority of effort in data warehousing and BI is all about. Unlike a search for a video clip of an episode of “Heroes”, or finding a particular book on Amazon, the difficulty is that many ambiguous answers exist. Books are a nice analogy, as the world discovered long ago the sense of putting a unique (it terms out not quite unique, but for most purposes it is true) ISBN number on books to avoid ambiguity. This is not the case in large enterprises with “sales figure”, which in fact will exist not only in the official corporate finance system, but in several other “proper” systems, and in endless spreadsheets to which the information has been downloaded and possibly manipulated for various purposes. Indeed trying to make a meaningful and useful classification scheme around data is what master data management, and much of data modelling, is all about.

Imagine the fun Amazon would have in finding a book in a world with no ISBN number, and where rival publishers regularly published identical titles, some even from the same author. This is more like the world that BI deals with. Indeed, if there was one single place where “sales data” lived, and if everyone agreed on exactly which sales data that was (the whole company’s, just Europe’s, with or without indirect sales?) then the world of BI would be a simple place and data warehouse developers could pack up and learn a new trade. This reality seems to have eluded some vendors plugging BI search, and indeed some of the industry writers. It is almost irrelevant what “percentage” of data is unstructured, semi-structured, or structured. In the imperfect world of enterprise data a high proportion of the important data suffers from the persistent problem of ambiguous classification and multiple copies, with processes that do not perfectly control replication of that data. It is a world Google Search can shake an uncomprehending stick at all it likes, but to me it is likely to have only a limited impact. Until enterprises get a real grip on the life cycle of information management and put processes in place to properly classify and allow for update and distribution or master data (don’t hold your breath), the world of BI won’t be replaced by a search icon.