MDM Software Evaluation

The Information Difference recently launched a new offering called MDM Select. This builds on our existing detailed functional model of an ideal MDM product, and goes further by scoring all the leading MDM products in the market against it. An end user purchasing MDM select merely has to weight the various functions according to priority to them, which will vary by use case, and then press a button. The weighted scores of the leading MDM products will then be sent to them, removing the need for a lengthy evaluation process. At the very least this is a quick way to identify a shortlist suited to your needs. I will soon be talking about this at a webinar on Thursday September12th at 08:00 PST = 11:00 EST = 16:00 GMT = 17:00 European time.

To register and for more details see this link.

Upcoming webinar – Choosing an MDM Vendor

Choosing an MDM vendor is (or should be) a big decision. It is not just the price of the software – you will, according to Information Difference research, spend four times as much on services as on software when you implement an MDM solution, and then you have to consider maintenance over many years. Yet in my experience some companies do not allow much time to choose their vendor. They perhaps go with an incumbent platform vendor, or ask a systems integrator, or maybe do a brief beauty parade of a few vendors. In a free upcoming webinar I talk about best practice in this area:

It is on September 19th – register now.

A Sideways Glance

Vertica is one of the plethora of vendors which have emerged in the analytics “fast database” space pioneered by Teradata and more recently opened up by Netezza. The various vendors take different approaches. Some (e.g. Netezza) have proprietary hardware, some (e.g. Kognitio, Dataupia) are software only, some (e.g. ParAccel) rely mainly on in-memory techniques, others simply use different designs from the traditional designs of the mainstream DBMS vendors (Oracle, DB2).

Vertica (whose CTO is Mike Stonebraker of Ingres and Postgres fame) is in the latter camp. Like Sybase IQ (and Sand) it uses a column-oriented design (i.e., it groups data together by column on disk) rather than the usual row-oriented storage used by Oracle and the like. This approach has a number of advantages for query performance. It reduces disk I/O by only having to read the columns referenced by the query and also by aggressively compressing data within columns. Through use of parallelism across clusters of shared-nothing computers, Vertica databases can scale easily and affordably by adding additional servers to the cluster. Normally the drawback to column-oriented approaches is their relatively slow data load times, but Vertica has some tricks up its sleeve (a mix of in-memory processing which trickle feeds disk updating) which it claims allow load times comparable to, and sometimes better than, row-oriented databases. Vertica comes with an automated design feature that allows DBAs to provide it with the logical schema, plus training data and queries, which it then uses to come up with a physical structure that organizes, compresses and partitions data across the cluster to best match the workload (though ever-wary DBAs can always override this if they think they are smarter). With a standard SQL interface Vertica can work with existing ETL and business intelligence tools such as Business Objects, and has significantly expanded the list of supported vendors in their upcoming 2.0 release.

With so many competing vendors all claiming tens of times better performance than others, the measure that perhaps matters most is not a lab benchmark but customer take-up. Vertica now has 30 customers such as Comcast, BlueCrest Capital Management, NetworkIP, Sonian Networks and LogiXML, and with its upcoming 2.0 release out on 19/2/2008 is doing joint roadshows with some of these. It has done well in Telcos, who have huge data volumes in their call detail records databases. Two deployed Vertica customers have databases approaching 40 TB in size. Another area is financial services, where hedge funds want to backtest their trading algorithms against historical market data. With one year worth of US financial markets data taking up over 2TB, this can quickly add up, and so Vertica has proved popular amongst this community, as well as with marketing companies with large volumes of consumer data to trawl trough. Vertica runs on standard Linux servers, and it has a partnership with HP and Red Hat to provide a pre-bundled appliance, which is available from select HP resellers.

With solid VC backing, a glittering advisory board (Jerry Held, Ray Lane, Don Hadrele,…) and genuine customer traction in an industry long on technology but short on deployed customers, Vertica should be on every vendor short-list for companies with heavy duty analytical requirements which currently stretch performance limits and budgets

A Lively Data Warehouse Appliance

DATAllegro was one of the earlier companies to market (2003) in the recent stampede of what I call ”fast databases”, which covers appliances and other approaches to speedy analytics (such as in-memory databases or column-oriented databases). Initially DATAllegro had its own hardware stack (like Netezza) but now uses a more open combination of storage from EMC and Dell Servers (with Cisco InfiniBand Interconnect). It runs on the well proven Ingres database, which has the advantage of being more “tuneable” than some other open databases like MySQL.

The database technology used means that plugging in business intelligence tools is easy, and the product is certified for the major BI tools such as Cognos and Business Objects, and recently Microstrategy. It can also work with Informatica and Ascential Datastage (now IBM) for ETL. Each fast database vendor has its own angle on why its technology is the best, but there are a couple of differentiators that DATAllegro has. One is that it does well in situations of mixed workloads, where as well as queries there are concurrent loads and even updates happening to the database. Another is its new “grid” technology, which allows customers to deal with the age-old compromise of centralised warehouse v decentralised data marts. Centralised is simplest to maintain but creates a bottleneck and creates scale challenges. However de-centralised marts quickly become un-co-ordinated and can lead to lack of business confidence in the data. The DATAllegro grid utilises node-to-node hardware transfer to allow dependent copies of data marts to be maintained from a central data warehouse. With transfer speeds of up to 1 TB a minute (!) claimed, such a deployment allows companies to have their cake and eat it. This technology is in use at one early customer site, and is just being released.

DATAllegro has set its sights firmly at the very high end of data volumes, those encountered by retailers and telcos. One large customer apparently has a live 470 TB database implementation, though since the company is very coy about naming its customers I cannot validate this. Still, this is enough data to give most DBAs sleepless nights, so it is fair to say that this is at the rarefied end of the data volume spectrum. This is territory firmly occupied by Teradata and Netezza (and to a lesser extent Greenplum). The company is tight-lipped about numbers of customers (and I can find only one named customer on its website), revenues and profitability, making it hard to know what market momentum is being achieved. However its technology seems to me to be based on solid foundations and has a large installed base of Teradata customers to attack. Interestingly, Oracle customers can be a harder sell, not because of the technology but because of the weight of stored procedures and triggers that customers have in Oracle’s proprietary extension to the SQL standard, making porting a major issue.

If only DATAllegro can encourage more customers to become public then it will be able to raise its profile further and avoid being painted as a niche vendor. Being secretive over customer and revenue numbers seems to me self-defeating, as it allows competitors to spread fear, uncertainty and doubt: sunlight is the best disinfectant, as Louis Brandeis so wisely said.

Peeking at Models

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here:

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.

Orchestrating MDM Workflow

France is rarely associated with enterprise software innovation (test: name a French software company other than Business Objects) but in MDM there are two interesting vendors. I have already written about Amalto, but the more established French MDM player is Orchestra Networks. Founded in 2000, this company has been selling its wares in the French market since 2003, and has built up some solid customer references, mainly in the financial services arena but also with global names such as Sanofi Aventis and Kraft.

The great strength of their EBX technology is the elaborate support for complex business process workflow, an area neglected by most MDM vendors. For example a customer may have an international product code hierarchy, and distribute this to several regions. Each of the regional branches may make local amendments to this, so what happens when a new version of the international hierarchy is produced? EBX provides functionality to detect differences between versions or branches and to allow for merging of these versions, supporting both draft “project” master data and the production versions, keeping track of all changes and supporting the workflow rules to support the full life-cycle of master data creation and update.

Typically such functionality is delivered with only by PIM vendors (Kalido is an exception), yet EBX is fully multi-domain by design, so is not restricted to any one class of master data. This will give it an advantage in competitive situations with vendors who have historically designed their technology around one type of master data (customer or product) and are only now realising the need to support multiple domains.

So far Orchestra Networks has confined itself to France, but opens its first overseas office in London soon. The company has taken the time to build out its technology to a solid level of maturity, and has productive partnerships with Informatica (for data quality and ETL) and Software AG, who OEM EBX and sell it globally at the heart of their own MDM offering.

In my own experience of MDM projects, the handling of the business processes around creating and updating master data is a key issue, yet most hub vendors have virtually ignored it, assuming this is somehow “out of scope”. Hub vendors typically focus on system to system communication e.g. validating a new customer code by checking a repository, and perhaps suggesting possible matches if a similar name is found. This is technically demanding as it is near real-time. However human to system interaction is also important, especially outside the customer domain, where business processes can be much more complex. By providing sophisticated support for this workflow Orchestra Networks can venture into situations where CDI vendors cannot easily go, and as I have written previously there are plenty of real business problems in MDM beyond customer.

It will be interesting to see how Orchestra Networks fares as it ventures outside of France in 2008.

The Gaul of it

I came across an interesting new MDM vendor recently called Amalto, a start-up from Paris (though they already have a California office). They have only been selling their software for less than a year, but already have a good set of early customers, such as Rio Tinto, Total, SNCF and BNP Paribas. Their Xtentis product offers a generic MDM repository with data movement (EAI like) functionality, and they make heavy use of standards (Eclipse, Ajax etc). Unusually, they use an XML database rather than a relational database as their underlying storage mechanism. Given the relatively low data volumes typical in MDM applications, this approach seems interesting, since XML databases are strong at handling data with complex structures (e.g. variable depth hierarchies) that one often encounters in master data. In case you think XML databases are unproven, Berkeley DB is probably the most widely deployed DBMS in the world, being embedded in many mobile phones, for example, and most phone users don’t have deep DBA skills. On a parochial note, it is nice to see a European software company emerging for a change (another MDM vendor is Orchestra Networks, also French).

Though an early stage company, Amalto is making good progress in the French market and in 2008 will start to expand to the USA. If they can firm up their positioning (confusingly, they also have a product for B2B exchanges, a quite different market, resold by Ariba) and develop good systems integration partnerships in the US then they should be an interesting addition to the MDM space. Their technology is innovative and their early customer stories sound promising.

Microstrategy Joins the Party

To go with Business Objects’ excellent Q4 results, Microstrategy also reported good figures, suggesting that the BI industry is in generally good shape. Revenue was USD 92.6M, up 20% over last year. Just USD 36.6M of this was license revenue, but this was 17% up on last year.

There was very healthy USD 32M operating margin, which means an operating margin of 34%. Other measures were also healthy e.g. days sales outstanding of just 54, and cash flow from operations of USD 26M. Admittedly this is down a little as expenses have risen by 22% year over year, but all the same this is a healthy business.

These results are all the more significant because Microstrategy has been in rather a flat phase for some time, with licence growth almost flat since early 2005. This perky set of results will be all the more welcome for its staff and shareholders given this background.

Oracle buys Sunopsis

It has just been announced that Oracle has bought Sunopsis, one of the few remaining independent ETL vendors.  Since Oracle’s existing ETL tool (the rather inaccurately named “Data Warehouse Builder”) is pretty weak, this makes a lot of sense for Oracle.  I suspect that their statement about “integrating” the two tools will involve much use of the delete key for the Warehouse Builder code. Sunopsis is a good product, a French company that had been around for some time but had recently made more visible market progress in the US.  No numbers are public, but my information is that Sunopsis revenues were about USD 10M and the purchase price was just over USD 50M, which at a price/sales ratio of over five makes a quite healthy price for the company.  Sunopsis was 80% owned by the founder, who had spurned venture capital, so this is very good personal news for him also. 

Sunopsis made a virtue of using the DBMS functions where possible rather than re-inventing transformation code, so is particularly compatible with Oracle (or other relational databases). This deal should also put paid to the loose marketing relationship Oracle had with Informatica. 

In my view this is a rare case where the deal is good for both companies.  Oracle finally gets a decent ETL capability and Sunopsis gets Oracle’s massive sales channel. 

Marketing blues

My prize for the most creative marketing jargon of the week goes to IBM, who announced that they now consider their offerings to be a “third generation” of business intelligence.  Come again?  In this view of the world, first generation BI was mainframe batch reporting, while the second generation was data warehousing and associated BI tools like Cognos, Business Objects etc.  So, as you wait with bated breath for the other shoe to drop, what is the “new generation”?  Well, it would seem that this should include three things:

(a) pre-packaged applications

(b) focus on the access and delivery of business information to end users, and support both information providers and information consumers

(c) support access to all sorts of information, not just that in a data warehouse.

Well (a) this is certainly a handy definition, since IBM just happens to provide a series of pre-built data models (e.g. their banking data model) and so (surprise) would satisfy the first of these criteria.  It is in fact by no means clear how useful such packages are outside of a few specific sectors that lend themselves to standardisation.  Once you take a pre-existing data model and modify it even a little (as you will need to) then you immediately create a major issue for how you support the next vendor upgrade.  This indeed is a major challenge that customers of the IBM banking model face.  Nothing in this paper talks about any new way of delivering these models e.g. any new semantic integration and versioning capability.

Criteria (b) is essentially meaningless since any self respecting BI tool could reasonably claim to focus on information consumers.  After all, the “universe” of Business Objects was a great example of putting user-defined terminology in front of the customer rather than just presenting tables and columns.  Almost any existing data warehouse with a decent reporting tool could claim to satisfy this criteria.

On (c) there is perhaps a kernel of relevance here, since there is no denying that some information needs are not always kept in a typical data warehouse e.g. unstructured data.  Yet IBM itself does not appear to have any new technology here, but merely is claiming that DB2 Data Joiner allows links to non-DB2 sources. All well and good, but this is not new. They haven’t even done something like OEM an unstructured query product like Autonomy, which would make sense.

Indeed all that this “3rd generation” appears to be is a flashy marketing label for IBM’s catalog of existing BI-related products.  They have Visual Warehouse, which is a glorified data dictionary (now rather oddly split into two separate physical stores) and scheduling tool, just as they always have.  They talk about ETI Extract as an ETL tool partner, which is rather odd given their acquisition of Ascential, which was after all one of the two pre-eminent ETL tools, and given ETI’s near-disappearance in the market over recent years.  They have DB2, which is a good database with support for datatypes other than numbers (just like other databases).  They also have some other assorted tools like Vality for data quality.

All well and good, but this is no more and no less than they had before. Moreover it could well be argued that this list of tools actually misses several important points that could be regarded as important from a “next generation” data warehouse architecture.  The paper is oddly silent on the connection between this and master data management, which is peculiar given IBM’s buying spree in this area and its direct relevance to data warehousing and data quality.  There is nothing about time-variance capabilities and versioning, which are increasingly important.  What about the ability to handle a federation of data warehouses and synchronise these?  What about truly business model-based data warehouse generation and maintenance?  How about the ability to be embedded into transactional systems via SOA?  What about “self discovery” data quality capabilities, which are starting to appear in some start ups.

Indeed IBM’s marketing group would do well to examine Bill Inmon’s DW 2.0 material, which while not perfect at least has a decent go at setting out some of the capabilities which one might expect from a next generation business intelligence system.

There is no denying that IBM has a lot of technology related to business intelligence and data warehousing (indeed, its buying spree has meant that it has a very broad range indeed).  Yet there is not a single thing in this whitepaper that constitutes a true step forward in technology or design.  It is simply a self-serving definition of a “3rd generation” that has nothing to do with limitations in current technology or new features that might actually be useful.  Instead it just sets out a definition which conveniently fits the menagerie of tools that IBM has developed and acquired in this area. To put together a whitepaper that articulates how a series of acquired technologies fits together is valid, and in truth this is what this paper is.  To claim that it represents some sort of generational breakthrough in an industry is just hubris, and destroys credibility in the eyes of any objective observer.  This is by no means unique in the software industry, but is precisely why software marketing has a bad name amongst customers, who are constantly promised the moon but delivered something a lot more down to earth.

I suppose when presented with the choice of developing new capabilities and product features that people might find useful, or just relabelling what you have lying around already as “next generation”, the latter is a great deal easier.  It is not, however, of any use to anyone outside a software sales and marketing team.