Andy on Enterprise Software

Software and the Nature of Being

May 12, 2008

Semantic integration is something I wrote about some time ago, but is definitely getting more attention than it used to. This week we see the launch of expressor, a start-up with some interesting features but amongst other things it plays in the semantic integration field. There are also products such as DataXtend from Progress, Contivo (bought by Liaison), Software AG’s Information Integrator, 42 Objects and Pantero, while early pioneer Unicorn was bought some time ago by IBM. Arguably, the technology used by certain data quality vendors such as Exeros and SilverCreek also qualifies.

Given the scale of the SOA bandwagon, I am a little surprised that semantic integration does not get even more attention. Perhaps it is the partly the name: “semantic” and “ontology” are hardly the terms that a marketer would come up with in trying to sell this technology to a mass audience. Moreover the problem is quite a deep one, and it is going to be a clever technology indeed that can browse through a company’s applications and derive a meaningful business model that captures all the implied meaning that is currently embedded within data models, database stored procedures and application code in all its guises.

Still, at least now there are a number of technologies starting to address the problem, and the market will decide which ones work and which ones are just marketing fluff. As SOA rumbles on, I expect to see more activity in this space, and more M&A activity as the larger vendors wake up to the importance of this area. However, it would be really nice if someone managed to come up with some decent names for this market. I had thought that “ontology” was a term that I could safely bury away in the recesses of my mind after I completed my philosophy subsidiary course at University. I can’t see it making to mass media, can you? “Link: The new semantic integration software with its own ontology endorsed by David Beckham” isn’t likely to be wending its way to a TV advert any time soon.

Psst, want a free business modelling tool?

February 20, 2008

Regular readers of this blog may recall that I mentioned the Kalido business modelling tool that was out with Kalido’s new software release. At TDWI Las Vegas yesterday Kalido launched this formally, and made it available for free download. There is also an on-line community set up to support this, in which as well as tool discussion, participants can share and collaborate on business models.

This seems a smart move to me, as by making the tool available for free Kalido will get some publicity for the tool that it would otherwise not get, and of course if people get hooked on the tool then they might wonder: “hey, maybe I could try connecting it up and building a warehouse” at which point, as the saying goes, a sales person will call. This follows the well-proven drug-dealer technique of giving away a free hit of something in order to lure you on to something more powerful and even more addictive in due course.

Business modelling does not get the attention it deserves, so the on-line forum could prove very interesting. The ability to share and improve models with others could turn out to be very appealing to those involved with projects of this nature; after all, essentially it is a source of free consultancy if the forum develops.

Visit http://www.kalido.com/bmcf to download a copy of the tool.
To join the community visit http://groups.google.com/group/bmcf

Peeking at Models

February 7, 2008

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here: http://www.kalido.com/resources-multimedia-center.htm

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.

Data quality whining

January 14, 2008

The data quality market is a paradoxical one, as I have discussed before. There is a plethora of vendors, yet few have revenues over USD 10 million. Despite this track record of marginalisation, more are popping up all the time. I am aware of 26 separate data quality vendors today, and this excludes the data quality offerings that have been absorbed into larger vendors such as SAS (DataFlux), Informatica (Similarity Systems), IBM (Ascential Quality Stage) and Business Objects (First Logic). Assuming that you care about data quality at all (and too few do) then how do you go about selecting one?

Well, one thing the industry has done itself no favours over is its confusing and technical terminology (if you don’t think terminology that the buyer understands matter, ask French and German wine producers about why Australian and other wine producers are drinking their lunch). A data quality tool may cover several stages:

discovery
profiling
matching
enrichment
consolidation
monitoring

and let’s just take one stage: matching. Vendors with data matching technology use a variety of techniques to match up candidate data records. These include:

heuristic matching (based on experience)
probabilistic (rules based)
deterministic (based on templates)
empirical (using dictionaries)

and this is not a comprehensive set. I saw an interesting technology today from Netrics which uses a different (patented) matching technology based on “bipartate graphs” (which in fact looked very impressive). How is an end-user buyer to make any sense of this maze? Certainly different data classes may demand different approaches, e.g. customer name and address data is highly structured and may suggest a different approach from much less structured or more complex data (such as product data, or asset data).

I am not sure of the merits of introducing something like a TPC/A benchmark for data quality (such benchmark exercises are tricky to pin down and vendors make great efforts to “game” them). However it would seem that it would not be that hard to take some common data quality issues, set up a set of common errors (transposed letters, missing letters or numbers, spurious additional letters or common misspellings) and try to match some of these up to a sample dataset in a way that compared the various algorithmic approaches, or indeed directly comparing the effectiveness of vendor products. By ensuring that different data types (not just customer name and address) are covered, such an approach may not result in a single “best” approach or product but show where certain approaches shine and others are less well suited. This in itself would be useful information for potential buyers, who at present must try to set up such bake-off comparisons themselves.

In the absence of any industry-wide benchmarks, each potential customer must set up their own benchmarks and attempt to navigate through the maze of arcane terminology, approaches and large number of vendors themselves each time. Such complexity of terminology must increase sales cycles and cause the data quality industry to be less appealing to buyers, who may just give up and just wait a larger vendor to add data quality as a feature (possibly in a manner than is sub-optimal for their particular customer needs).

Consider the wine analogy. If you buy a French wine you must navigate the subtleties of region, village, grower and vintage. For example I am looking right now at a bottle with the label “Grand Vin de Leoville Marquis de Las Cases St Julien Medoc Appellation St Julien Controlee 1975″ (it is from Bordeaux, but actually omits this from the label). Alternatively I can glance over to a (lovely) Italian wine from Jermann with the label “Where Dreams have No End”. Both are fine wines, but which is more likely to appeal to the consumer? Which is more inviting? The data quality has something to learn about marketing, in my view, just as the French wine industry has.

Orchestrating MDM Workflow

December 28, 2007

France is rarely associated with enterprise software innovation (test: name a French software company other than Business Objects) but in MDM there are two interesting vendors. I have already written about Amalto, but the more established French MDM player is Orchestra Networks. Founded in 2000, this company has been selling its wares in the French market since 2003, and has built up some solid customer references, mainly in the financial services arena but also with global names such as Sanofi Aventis and Kraft.

The great strength of their EBX technology is the elaborate support for complex business process workflow, an area neglected by most MDM vendors. For example a customer may have an international product code hierarchy, and distribute this to several regions. Each of the regional branches may make local amendments to this, so what happens when a new version of the international hierarchy is produced? EBX provides functionality to detect differences between versions or branches and to allow for merging of these versions, supporting both draft “project” master data and the production versions, keeping track of all changes and supporting the workflow rules to support the full life-cycle of master data creation and update.

Typically such functionality is delivered with only by PIM vendors (Kalido is an exception), yet EBX is fully multi-domain by design, so is not restricted to any one class of master data. This will give it an advantage in competitive situations with vendors who have historically designed their technology around one type of master data (customer or product) and are only now realising the need to support multiple domains.

So far Orchestra Networks has confined itself to France, but opens its first overseas office in London soon. The company has taken the time to build out its technology to a solid level of maturity, and has productive partnerships with Informatica (for data quality and ETL) and Software AG, who OEM EBX and sell it globally at the heart of their own MDM offering.

In my own experience of MDM projects, the handling of the business processes around creating and updating master data is a key issue, yet most hub vendors have virtually ignored it, assuming this is somehow “out of scope”. Hub vendors typically focus on system to system communication e.g. validating a new customer code by checking a repository, and perhaps suggesting possible matches if a similar name is found. This is technically demanding as it is near real-time. However human to system interaction is also important, especially outside the customer domain, where business processes can be much more complex. By providing sophisticated support for this workflow Orchestra Networks can venture into situations where CDI vendors cannot easily go, and as I have written previously there are plenty of real business problems in MDM beyond customer.

It will be interesting to see how Orchestra Networks fares as it ventures outside of France in 2008.

Santa comes early for HP

December 13, 2006

In a surprise move HP has snapped up Knightsbridge in a move to bolster its technology services business.  Knightsbridge had carved out a strong reputation for handling large data warehouse and BI projects for US corporations, and had grown to over USD 100M in revenue.  It was up with IBM as one of the two leading data warehouse consulting organisations.  This in itself makes it clear why it was attractive to HP, who do not have anything like such a strong reputation in this area.  Knightsbridge was growing strongly in 2006, and the financial terms of the deal are not public, but one would assume HP paid a good price for such a good business.  This will no doubt provide a happy retirement for the Knightsbridge founders, but it is less clear as to how well the Knightsbridge culture, which was quite fiercely vendor-independent, will sit within a behemoth like HP, which has its own technology offerings.  It was revealing that Knightsbridge CEO Rod Walker had dismissed service company acquisitions in an interview just a year ago, and for reasons which sounded pretty sensible.   No doubt this will present an interesting spin challenge for the Knightsbridge PR staff, but perhaps they will have other things on their minds, such as dusting off resumes.

“If the cultures of the two companies are not a near-perfect match, people will leave, and services is a people business.”  I couldn’t have put it better myself Rod.

Truth and myth

December 6, 2006

Malcolm Chisolm has penned a thoughtful article which argues that there essentially will never be a “single version of the truth” in an organisation of any size.  As he rightly points out, beyond a single related group of users e.g. in accounts or marketing, it is very difficult indeed to come up with a definition of a business term that is unambiguous and yet also satisfies everyone.  Which costs actually are counted in ”gross margin”?  Is a “customer” someone who has signed a contract, has been shipped goods, been invoiced or has paid?  These examples become vastly more difficult when considering a global enterprise operating in lots of countries.  If it is hard to get production, marketing and finance to agree on a definition within the same office, what are your chances of getting agreement between 50 countries?  A TDWI survey some time ago showed how far away companies are from fixing this, and this survey was of US companies rather than multinational ones.  

This issue is at the heart of master data management, and is why MDM is a lot more than putting in a “customer hub”.  Managing the inevitable diversity of business definitions, ensuring that they are synchronised between systems, dealing with changes to them and providing processes to improve the quality of master data is what an MDM project should address.  A technical solution like an MDM repository or series of hubs is part of providing a solution. but only a part.  Significant organisational resources and processes need to be constructed and applied to this issue.  Even when these are done, it is a journey rather than a destination: data quality will never be perfect, and there will always be business changes that will throw up new challenges to maintaining high quality, synchronised master data.  However, the sooner that this message gets through the sooner organisations can start to really begin to improve their master data situation rather than just plugging in the latest technological silver bullet.

 

 

Trick or treat

October 31, 2006

I’m not sure who had the idea of holding a data quality conference on Halloween, but it was either a lucky coincidence or a truly inspired piece of scheduling.  DAMA ran today in London, and continues tomorrow.  This also fits with the seasonal festival, which originally was a Celtic festival over two days when the real world and that of ghosts overlapped temporarily.  Later the Christian church tried to claim it as their own by calling November 1st All Hallows Day, with 31st October being All Hallows Eve, which in the fullness of time became Halloween.  I will resist the temptation to point out the deterioration in data quality over time that this name change illustrates.  The conference is held at the modern Victoria Park Plaza hotel, which is that rare thing in London: a venue that seems vaguely aware of technology.  It is rumoured that there is even wireless access here, but perhaps that is just the ghosts whispering.

The usual otherworldly spirits were out on this day: the ghouls of the conference circuit were presenting (such as me), while scary topics like data architecture, metadata repositories and data models had their outings.  The master data management monster seemed to be making a bid to take over the conference, with assorted data quality and other vendors who had never heard the phrase a year ago confidently asserting their MDM credentials.  You’d have to be a zombie to fall for this, surely?  At least one pitch I heard a truly contorted segue from classic data quality issues into MDM, with a hastily added slide basically saying “and all this name and address matching stuff is really what MDM is about really, anyway”.  Come on guys, if you are going to try to pep up your data profiling tool with an MDM spin, at least try and do a little research.  One vendor gave a convincing looking slide about a new real-time data quality tool which I know for a fact has no live customers, but then such horrors are nothing new in the software industry.

The conference itself was quite well attended, with about 170 proper customers, plus the usual hangers-on.  Several of the speaker sessions over the conference do feature genuine experts in their field, so it seems the conference organisers have managed to minimise the witches brew of barely disguised sales pitches by software sales VPs masquerading as independent “experts” that all too often pack conference agendas these days. 

Just as it seems that the allure of ghosts is undiminished even in our modern age, so the problems around the age-old issue of data quality seem as spritely (sorry, I couldn’t resist that one) as ever.  New technologies appear, but the data quality in large corporations seems to be largely impervious to technical solutions.  It is a curious thing but given that data quality problems are very, very real, why can no one seem to make any real money in this market?  Trillium is the market leader, and although it is no longer entirely clear what their revenues are, about USD 50M is what I see in my crystal ball. Other independent data quality vendors now swallowed by larger players had revenues in the sub USD 10M range when they were bought (Dataflux, Similarity Systems, Vality).  First Logic was bigger at around USD 50M but the company went for a song (the USD 65M price tag gives a revenue multiple no-one will be celebrating).  Perhaps the newer generation of data quality vendors will have more luck.  Certainly the business problem is as monstrous as ever.

I am posting this just on the stroke of midnight.  Happy Halloween!

 

Kalido repositions itself

October 19, 2006

Kalido has now announced revised positioning targeted at selling solutions to business problems (and will soon announce a new major product release). The key elements are as follows. The existing enterprise data warehouse and master data management product offerings remain, but have been packaged with some new elements into solutions which are effectively different pricing/functionality mechanisms on the same core code base.

The main positioning change is the introduction of pre-built business models on top of the core technology to provide “solutions” in the areas of profitability management, specifically “customer profitability” and “product profitability”. This move is, in many ways, long overdue, as Kalido was frequently deployed in such applications but previously made no attempt to provide a pre-configured data model. Given that Kalido is very strong at version management, it is about the one data warehouse technology that can plausibly offer this without falling into the “analytic app” trap whereby a pre-built data model, once tailored, quickly becomes out of synch with new releases (as Informatica can testify after their ignominious withdrawal from this market a few years ago). In Kalido’s case its version management allows for endless tinkering with the data model while still being able to recreate previous model versions.

Kalido also announced two new packaging offerings targeted at performance management/business intelligence, one for data mart consolidation and one for a repository for corporate performance management (the latter will be particularly aimed at Cognos customers, with whom Kalido recently announced a partnership). Interestingly, these two offerings are available on a subscription basis as an alternative to traditional licensing. This is a good idea, since the industry in general is moving towards such pricing models, as evidenced by salesforce.com in particular. In these days of carefully scrutinised procurement of large software purchases, having something the customers can try and out rent rather than buy should ease sales cycles.

The recent positioning change doesn’t, however, ignore the IT audience – with solution sets geared toward “Enterprise Data Management” and “Master Data Management.” The enterprise data management category contains solutions that those familiar with Kalido will recognize as typical use cases – departmental solutions, enterprise data warehouse and networked data warehouse. The key product advance here is in scalability. Kalido was always able to handle large volumes of transaction data (one single customer instance had over a billion transactions) but there was an Achilles heel if there was a single very large master data dimension of many million of records. In B2B situations this doesn’t happen (how many products do you sell, or how many stores do you have – tens or hundreds of thousands only) but in B2C situations e.g. retail banking and Telco, it could be a problem given that you could well have 50 million customers. Kalido was comfortable up to about 10 million master data items or so in a single dimension, but struggled much beyond that, leaving a federated (now “networked”) approach as the only way forward. However in the new release some major re-engineering underneath the covers allows very large master data dimension in the 100 million range. This effectively removes the only real limitation on Kalido scalability; now you can just throw hardware at very large single instances, while Kalido’s unique ability to support a network of linked data warehouses continues to provide an effective way of deploying global data warehouses.

Technologically, Kalido’s master data management (MDM) product/solution is effectively unaffected by these announcements since it is a different code base, and a major release of this is due in January.

This new positioning targets Kalido more clearly as a business application, rather than a piece of infrastructure. This greater clarity is a result of its new CEO (Bill Hewitt), who has a strong marketing background, and should improve the market understanding of what Kalido is all about. Kalido always had differentiated technology and strong customer references (a 97% customer renewal rate testifies to that) but suffered from market positioning that switched too often and was fuzzy about the customer value proposition. This is an encouraging step in the right direction.

Wilde Abstraction

June 21, 2006

Eric Kavanagh makes some very astute points in an article on TDWI regarding abstraction. As he rightly points out, a computer system that models the real world will have to deal with business hierarchies such as general ledgers, asset hierarchies etc that are complex in several ways. To start with there are multiple valid views. Different business people have a different perspective on “Product” for example: a marketer will be interested in the brand, price and packaging, but from the point of view of someone in distribution the physical dimensions of the product are important, in what size container it comes in, how it should be stacked etc. Moreover, as Eric points out, many hierarchies are “ragged” in nature, something that not all systems are good at dealing with.

The key point he makes, in my view, is that business people should be presented with a level of abstraction that can be put in their own terms. In other words the business model should drive the computer system, not the other way around. Moreover, as the article notes, if you maintain this abstraction layer properly then historical comparison becomes possible e.g. comparing values over time as the hierarchies change. Indeed the ability to reconstruct past hierarchies is something that I believe is increasingly important in these days of greater regulatory compliance, yet it is often neglected in many systems, both packages and custom-built. The key points he makes on the value of an abstraction layer:

- the abstraction layer shields the application from business change
- business-model driven, with the ability to have multiple views on the same underlying data
- time variance built in
- the layer can be a platform for master data management

neatly sum up the key advantages of the Kalido technology, and indeed sums up why I set up Kalido in the first place, since I felt that existing packages and approaches failed in these key areas. It is encouraging to me that these points are starting to gain wider acceptance as genuine issues that the industry needs to better address if it is to give its customers what they really need. To quote Oscar Wilde “There is only one thing in the world worse than being talked about, and that is not being talked about.” I hope these key issues, which most designers of computer systems seem not to grasp, get talked about a lot more.