Applying Benchmarks

There is a particularly well written article on data warehouse appliances by Doug Henschen in Intelligent Enterprise, which both usefully summarises the market today and has a very clear explanation as to why columnar databases are well suited to certain applications.

Above all, it contains an excellent piece of advice on benchmarks, which is not to trust them and to ensure that vendors test their systems on your own data. Benchmarks have a place, but vendors go to considerable lengths to ensure that their tests are highly optimised to the particular benchmarks. For example, many TPC-H tests end up being executed entirely in memory using hardware that is not like that server sitting under your desk in the office. Vendors will heavily tune their systems to ensure that their TPC numbers look good, but the data they are using may have very different characteristics to yours, and may perform very differently.

Fortunately, the explosion of appliance vendors in recent years means that there is now serious competition, so you should have no trouble convincing vendors to carry out proper proofs of concept using your own data.

The Information Difference

Today sees the launch of the Information Difference, a boutique market research and analyst firm specialising in the master data management market. This reflects the increasing interest in this fast-growing area. The company has developed detailed profiles of all the vendors in the MDM space, as well as of some of the major and most interesting players in the related data quality space. The company will shortly announce its first piece of primary research (into MDM adoption) and will produce white-papers on key issues in the MDM market.

Its principals are Dave Waddington (ex Chief Architect at Unilever Foods) and myself, with some part-time assistance from a number of other talented individuals. It is nice to see some positive reactions from some serious industry luminaries (see press release).

We hope to bring a more in-depth perspective to this emerging market than is common today, and have some exciting research in preparation.

For more information see the company website.

Software and the Nature of Being

Semantic integration is something I wrote about some time ago, but is definitely getting more attention than it used to. This week we see the launch of expressor, a start-up with some interesting features but amongst other things it plays in the semantic integration field. There are also products such as DataXtend from Progress, Contivo (bought by Liaison), Software AG’s Information Integrator, 42 Objects and Pantero, while early pioneer Unicorn was bought some time ago by IBM. Arguably, the technology used by certain data quality vendors such as Exeros and SilverCreek also qualifies.

Given the scale of the SOA bandwagon, I am a little surprised that semantic integration does not get even more attention. Perhaps it is the partly the name: “semantic” and “ontology” are hardly the terms that a marketer would come up with in trying to sell this technology to a mass audience. Moreover the problem is quite a deep one, and it is going to be a clever technology indeed that can browse through a company’s applications and derive a meaningful business model that captures all the implied meaning that is currently embedded within data models, database stored procedures and application code in all its guises.

Still, at least now there are a number of technologies starting to address the problem, and the market will decide which ones work and which ones are just marketing fluff. As SOA rumbles on, I expect to see more activity in this space, and more M&A activity as the larger vendors wake up to the importance of this area. However, it would be really nice if someone managed to come up with some decent names for this market. I had thought that “ontology” was a term that I could safely bury away in the recesses of my mind after I completed my philosophy subsidiary course at University. I can’t see it making to mass media, can you? “Link: The new semantic integration software with its own ontology endorsed by David Beckham” isn’t likely to be wending its way to a TV advert any time soon.

Opening up data quality

There is an interesting web forum which seeks to bring an open source approach to the world of data management. Of interest are topics involving the creation of open source de-duplication, profiling, matching and cleansing tools (hat tip to CW for pointing this out).

No doubt the tools here are at an early stage and won’t directly compare in broad functionality with a major data quality vendor. However, for many people with less sophisticated requirements that may not matter. The rise of products like MySQL has shown how influential an open source product can become given the right circumstances.

I would be very interested as to whether any readers of the blog have any experience with the tools here, or any views on the merits or otherwise of an open approach to data quality and data integration.