Software and the Nature of Being

Semantic integration is something I wrote about some time ago, but is definitely getting more attention than it used to. This week we see the launch of expressor, a start-up with some interesting features but amongst other things it plays in the semantic integration field. There are also products such as DataXtend from Progress, Contivo (bought by Liaison), Software AG’s Information Integrator, 42 Objects and Pantero, while early pioneer Unicorn was bought some time ago by IBM. Arguably, the technology used by certain data quality vendors such as Exeros and SilverCreek also qualifies.

Given the scale of the SOA bandwagon, I am a little surprised that semantic integration does not get even more attention. Perhaps it is the partly the name: “semantic” and “ontology” are hardly the terms that a marketer would come up with in trying to sell this technology to a mass audience. Moreover the problem is quite a deep one, and it is going to be a clever technology indeed that can browse through a company’s applications and derive a meaningful business model that captures all the implied meaning that is currently embedded within data models, database stored procedures and application code in all its guises.

Still, at least now there are a number of technologies starting to address the problem, and the market will decide which ones work and which ones are just marketing fluff. As SOA rumbles on, I expect to see more activity in this space, and more M&A activity as the larger vendors wake up to the importance of this area. However, it would be really nice if someone managed to come up with some decent names for this market. I had thought that “ontology” was a term that I could safely bury away in the recesses of my mind after I completed my philosophy subsidiary course at University. I can’t see it making to mass media, can you? “Link: The new semantic integration software with its own ontology endorsed by David Beckham” isn’t likely to be wending its way to a TV advert any time soon.

Psst, want a free business modelling tool?

Regular readers of this blog may recall that I mentioned the Kalido business modelling tool that was out with Kalido’s new software release. At TDWI Las Vegas yesterday Kalido launched this formally, and made it available for free download. There is also an on-line community set up to support this, in which as well as tool discussion, participants can share and collaborate on business models.

This seems a smart move to me, as by making the tool available for free Kalido will get some publicity for the tool that it would otherwise not get, and of course if people get hooked on the tool then they might wonder: “hey, maybe I could try connecting it up and building a warehouse” at which point, as the saying goes, a sales person will call. This follows the well-proven drug-dealer technique of giving away a free hit of something in order to lure you on to something more powerful and even more addictive in due course.

Business modelling does not get the attention it deserves, so the on-line forum could prove very interesting. The ability to share and improve models with others could turn out to be very appealing to those involved with projects of this nature; after all, essentially it is a source of free consultancy if the forum develops.

Visit to download a copy of the tool.
To join the community visit

Peeking at Models

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here:

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.

Data quality whining

The data quality market is a paradoxical one, as I have discussed before. There is a plethora of vendors, yet few have revenues over USD 10 million. Despite this track record of marginalisation, more are popping up all the time. I am aware of 26 separate data quality vendors today, and this excludes the data quality offerings that have been absorbed into larger vendors such as SAS (DataFlux), Informatica (Similarity Systems), IBM (Ascential Quality Stage) and Business Objects (First Logic). Assuming that you care about data quality at all (and too few do) then how do you go about selecting one?

Well, one thing the industry has done itself no favours over is its confusing and technical terminology (if you don’t think terminology that the buyer understands matter, ask French and German wine producers about why Australian and other wine producers are drinking their lunch). A data quality tool may cover several stages:


and let’s just take one stage: matching. Vendors with data matching technology use a variety of techniques to match up candidate data records. These include:

heuristic matching (based on experience)
probabilistic (rules based)
deterministic (based on templates)
empirical (using dictionaries)

and this is not a comprehensive set. I saw an interesting technology today from Netrics which uses a different (patented) matching technology based on “bipartate graphs” (which in fact looked very impressive). How is an end-user buyer to make any sense of this maze? Certainly different data classes may demand different approaches, e.g. customer name and address data is highly structured and may suggest a different approach from much less structured or more complex data (such as product data, or asset data).

I am not sure of the merits of introducing something like a TPC/A benchmark for data quality (such benchmark exercises are tricky to pin down and vendors make great efforts to “game” them). However it would seem that it would not be that hard to take some common data quality issues, set up a set of common errors (transposed letters, missing letters or numbers, spurious additional letters or common misspellings) and try to match some of these up to a sample dataset in a way that compared the various algorithmic approaches, or indeed directly comparing the effectiveness of vendor products. By ensuring that different data types (not just customer name and address) are covered, such an approach may not result in a single “best” approach or product but show where certain approaches shine and others are less well suited. This in itself would be useful information for potential buyers, who at present must try to set up such bake-off comparisons themselves.

In the absence of any industry-wide benchmarks, each potential customer must set up their own benchmarks and attempt to navigate through the maze of arcane terminology, approaches and large number of vendors themselves each time. Such complexity of terminology must increase sales cycles and cause the data quality industry to be less appealing to buyers, who may just give up and just wait a larger vendor to add data quality as a feature (possibly in a manner than is sub-optimal for their particular customer needs).

Consider the wine analogy. If you buy a French wine you must navigate the subtleties of region, village, grower and vintage. For example I am looking right now at a bottle with the label “Grand Vin de Leoville Marquis de Las Cases St Julien Medoc Appellation St Julien Controlee 1975” (it is from Bordeaux, but actually omits this from the label). Alternatively I can glance over to a (lovely) Italian wine from Jermann with the label “Where Dreams have No End”. Both are fine wines, but which is more likely to appeal to the consumer? Which is more inviting? The data quality has something to learn about marketing, in my view, just as the French wine industry has.

Orchestrating MDM Workflow

France is rarely associated with enterprise software innovation (test: name a French software company other than Business Objects) but in MDM there are two interesting vendors. I have already written about Amalto, but the more established French MDM player is Orchestra Networks. Founded in 2000, this company has been selling its wares in the French market since 2003, and has built up some solid customer references, mainly in the financial services arena but also with global names such as Sanofi Aventis and Kraft.

The great strength of their EBX technology is the elaborate support for complex business process workflow, an area neglected by most MDM vendors. For example a customer may have an international product code hierarchy, and distribute this to several regions. Each of the regional branches may make local amendments to this, so what happens when a new version of the international hierarchy is produced? EBX provides functionality to detect differences between versions or branches and to allow for merging of these versions, supporting both draft “project” master data and the production versions, keeping track of all changes and supporting the workflow rules to support the full life-cycle of master data creation and update.

Typically such functionality is delivered with only by PIM vendors (Kalido is an exception), yet EBX is fully multi-domain by design, so is not restricted to any one class of master data. This will give it an advantage in competitive situations with vendors who have historically designed their technology around one type of master data (customer or product) and are only now realising the need to support multiple domains.

So far Orchestra Networks has confined itself to France, but opens its first overseas office in London soon. The company has taken the time to build out its technology to a solid level of maturity, and has productive partnerships with Informatica (for data quality and ETL) and Software AG, who OEM EBX and sell it globally at the heart of their own MDM offering.

In my own experience of MDM projects, the handling of the business processes around creating and updating master data is a key issue, yet most hub vendors have virtually ignored it, assuming this is somehow “out of scope”. Hub vendors typically focus on system to system communication e.g. validating a new customer code by checking a repository, and perhaps suggesting possible matches if a similar name is found. This is technically demanding as it is near real-time. However human to system interaction is also important, especially outside the customer domain, where business processes can be much more complex. By providing sophisticated support for this workflow Orchestra Networks can venture into situations where CDI vendors cannot easily go, and as I have written previously there are plenty of real business problems in MDM beyond customer.

It will be interesting to see how Orchestra Networks fares as it ventures outside of France in 2008.

Santa comes early for HP

In a surprise move HP has snapped up Knightsbridge in a move to bolster its technology services business.  Knightsbridge had carved out a strong reputation for handling large data warehouse and BI projects for US corporations, and had grown to over USD 100M in revenue.  It was up with IBM as one of the two leading data warehouse consulting organisations.  This in itself makes it clear why it was attractive to HP, who do not have anything like such a strong reputation in this area.  Knightsbridge was growing strongly in 2006, and the financial terms of the deal are not public, but one would assume HP paid a good price for such a good business.  This will no doubt provide a happy retirement for the Knightsbridge founders, but it is less clear as to how well the Knightsbridge culture, which was quite fiercely vendor-independent, will sit within a behemoth like HP, which has its own technology offerings.  It was revealing that Knightsbridge CEO Rod Walker had dismissed service company acquisitions in an interview just a year ago, and for reasons which sounded pretty sensible.   No doubt this will present an interesting spin challenge for the Knightsbridge PR staff, but perhaps they will have other things on their minds, such as dusting off resumes.

“If the cultures of the two companies are not a near-perfect match, people will leave, and services is a people business.”  I couldn’t have put it better myself Rod.

Truth and myth

Malcolm Chisolm has penned a thoughtful article which argues that there essentially will never be a “single version of the truth” in an organisation of any size.  As he rightly points out, beyond a single related group of users e.g. in accounts or marketing, it is very difficult indeed to come up with a definition of a business term that is unambiguous and yet also satisfies everyone.  Which costs actually are counted in “gross margin”?  Is a “customer” someone who has signed a contract, has been shipped goods, been invoiced or has paid?  These examples become vastly more difficult when considering a global enterprise operating in lots of countries.  If it is hard to get production, marketing and finance to agree on a definition within the same office, what are your chances of getting agreement between 50 countries?  A TDWI survey some time ago showed how far away companies are from fixing this, and this survey was of US companies rather than multinational ones.  

This issue is at the heart of master data management, and is why MDM is a lot more than putting in a “customer hub”.  Managing the inevitable diversity of business definitions, ensuring that they are synchronised between systems, dealing with changes to them and providing processes to improve the quality of master data is what an MDM project should address.  A technical solution like an MDM repository or series of hubs is part of providing a solution. but only a part.  Significant organisational resources and processes need to be constructed and applied to this issue.  Even when these are done, it is a journey rather than a destination: data quality will never be perfect, and there will always be business changes that will throw up new challenges to maintaining high quality, synchronised master data.  However, the sooner that this message gets through the sooner organisations can start to really begin to improve their master data situation rather than just plugging in the latest technological silver bullet.



Wilde Abstraction

Eric Kavanagh makes some very astute points in an article on TDWI regarding abstraction. As he rightly points out, a computer system that models the real world will have to deal with business hierarchies such as general ledgers, asset hierarchies etc that are complex in several ways. To start with there are multiple valid views. Different business people have a different perspective on “Product” for example: a marketer will be interested in the brand, price and packaging, but from the point of view of someone in distribution the physical dimensions of the product are important, in what size container it comes in, how it should be stacked etc. Moreover, as Eric points out, many hierarchies are “ragged” in nature, something that not all systems are good at dealing with.

The key point he makes, in my view, is that business people should be presented with a level of abstraction that can be put in their own terms. In other words the business model should drive the computer system, not the other way around. Moreover, as the article notes, if you maintain this abstraction layer properly then historical comparison becomes possible e.g. comparing values over time as the hierarchies change. Indeed the ability to reconstruct past hierarchies is something that I believe is increasingly important in these days of greater regulatory compliance, yet it is often neglected in many systems, both packages and custom-built. The key points he makes on the value of an abstraction layer:

– the abstraction layer shields the application from business change
– business-model driven, with the ability to have multiple views on the same underlying data
– time variance built in
– the layer can be a platform for master data management

neatly sum up the key advantages of the Kalido technology, and indeed sums up why I set up Kalido in the first place, since I felt that existing packages and approaches failed in these key areas. It is encouraging to me that these points are starting to gain wider acceptance as genuine issues that the industry needs to better address if it is to give its customers what they really need. To quote Oscar Wilde “There is only one thing in the world worse than being talked about, and that is not being talked about.” I hope these key issues, which most designers of computer systems seem not to grasp, get talked about a lot more.