Andy on Enterprise Software

A Lively Data Warehouse Appliance

February 15, 2008

DATAllegro was one of the earlier companies to market (2003) in the recent stampede of what I call ”fast databases”, which covers appliances and other approaches to speedy analytics (such as in-memory databases or column-oriented databases). Initially DATAllegro had its own hardware stack (like Netezza) but now uses a more open combination of storage from EMC and Dell Servers (with Cisco InfiniBand Interconnect). It runs on the well proven Ingres database, which has the advantage of being more “tuneable” than some other open databases like MySQL.

The database technology used means that plugging in business intelligence tools is easy, and the product is certified for the major BI tools such as Cognos and Business Objects, and recently Microstrategy. It can also work with Informatica and Ascential Datastage (now IBM) for ETL. Each fast database vendor has its own angle on why its technology is the best, but there are a couple of differentiators that DATAllegro has. One is that it does well in situations of mixed workloads, where as well as queries there are concurrent loads and even updates happening to the database. Another is its new “grid” technology, which allows customers to deal with the age-old compromise of centralised warehouse v decentralised data marts. Centralised is simplest to maintain but creates a bottleneck and creates scale challenges. However de-centralised marts quickly become un-co-ordinated and can lead to lack of business confidence in the data. The DATAllegro grid utilises node-to-node hardware transfer to allow dependent copies of data marts to be maintained from a central data warehouse. With transfer speeds of up to 1 TB a minute (!) claimed, such a deployment allows companies to have their cake and eat it. This technology is in use at one early customer site, and is just being released.

DATAllegro has set its sights firmly at the very high end of data volumes, those encountered by retailers and telcos. One large customer apparently has a live 470 TB database implementation, though since the company is very coy about naming its customers I cannot validate this. Still, this is enough data to give most DBAs sleepless nights, so it is fair to say that this is at the rarefied end of the data volume spectrum. This is territory firmly occupied by Teradata and Netezza (and to a lesser extent Greenplum). The company is tight-lipped about numbers of customers (and I can find only one named customer on its website), revenues and profitability, making it hard to know what market momentum is being achieved. However its technology seems to me to be based on solid foundations and has a large installed base of Teradata customers to attack. Interestingly, Oracle customers can be a harder sell, not because of the technology but because of the weight of stored procedures and triggers that customers have in Oracle’s proprietary extension to the SQL standard, making porting a major issue.

If only DATAllegro can encourage more customers to become public then it will be able to raise its profile further and avoid being painted as a niche vendor. Being secretive over customer and revenue numbers seems to me self-defeating, as it allows competitors to spread fear, uncertainty and doubt: sunlight is the best disinfectant, as Louis Brandeis so wisely said.

Peeking at Models

February 7, 2008

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here: http://www.kalido.com/resources-multimedia-center.htm

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.

Informatica prospers

February 6, 2008

Informatica shrugged off the US financial services troubles with a strong quarter. Revenue was USD 114M, up 24% year on year, operating profit of USD 25M up 47%. Licence revenue was up 28%, with ten deals over USD 1 million in size.

Informatica continues to prosper as the leading independent ETL tool. IBM’s acquisition of Ascential is rumoured not to have been one of its happier ones, and certainly it seems to have done Informatica no harm at all.

The MDM Blues

January 31, 2008

After living in denial for some time, IBM have got the “multi domain” message about MDM which I have been bleating on about at length for years. They have just announced a repackaging of their MDM offerings under the banner “IBM Infosphere MDM Server”. This puts IBM firmly on the path of a server architecture that can deal with multiple types of MDM data in a consistent manner, not just customer and product but all the many other kinds of master data e.g. location, asset, contract, brand, financial profile, …..IBM has been sensibly enabling their MDM offerings in an SOA context, and MDM Server comes with 800 pre-packaged SOA services that can be invoked. IBM has bought high quality MDM technology and now at last has a strong vision of how to bring it all together.

However it is worth emphasising that this is a roadmap. For now there will remain the separate CDI hub technology (bought from DWL) and the PIM Hub technology (bought from Trigo). Over time these technologies will be integrated with common services, but this is a multi-release strategy. It is great news that IBM has finally realised that multi-domain is the right way to go, but prospects and customers need to reassure themselves about whether the roadmap meets their time horizons.

Broadening Information Access

January 15, 2008

I saw an interesting demo today from Endeca, which bills itself as an “information access” company. Of course ever self-respecting BI company would describe itself in a similar way, but Endeca’s technology is quite different in approach from BI vendors. If you build a data warehouse and then add BI reporting to it, you quickly realise that “ad hoc” reporting by end-users is fine on the prototype with a few hundred records, but less amusing if there are a few hundred millions of records involved. Hence in real life aggregates are pre-calculated, predefined reports are carefully tuned and cubes (e.g. with Cognos Powerplay or similar) are built on common subsets of data that the users are likely to want. There is always a careful trade-off between flexibility and performance. Moreover the unstructured world or documents and emails is pretty much a separate dimension, however much in reality the context of a business transaction may be described by those emails and documents rather than what is stored in the sales order system.

Endeca has a proprietary database engine which is designed to combine both structured and unstructured data in a flexible way. The MDEX engine does not just store metadata such as hierarchies and structures, but also master data such as lists of product codes. It also indexes documents and emails from corporate systems (there are a series of adaptors with the technology). The technology makes much use of in-memory searches and caching to optimise performance. Some of the implementations can be large and complex: one deployed pensions system has 800 million records, while an electronic parts application deployed has 20,000 distinct attributes.

An example of such a system that resonated with me was a “human capital” demo which was based on the idea of a consultancy practice manager. A screen was shown allowing filtering on a range of areas e.g. consultant’s billing rates, availability, location etc. So far this looked just like the kind of thing you could prepare with a BI tool e.g. you could select consultants available in the next two weeks, with a billing rate of such and such, etc, and the list of consultants would dynamically refresh. No big deal. However the next filter was “all consultants based within x miles of Detroit”; the consultant records had been tagged with geocodes and the engine calculated distances from this information. Next a query was made to find all those who also spoke French, this information not being a database index but something buried away in the consultant’s resumes i.e. in unstructured document form. Good luck writing SQL to handle these kinds of filters!

There are plenty of situations where this mix of structured and unstructured information is important, and Endeca has prospered as a company from this dawning realisation. The company has doubled its revenue for five years in a row, and in Q4 2007 did USD 30 million in revenue, two-thirds of this in software licences. With a strong base of retail customers such as Tesco and Walmart, other verticals strongly represented include government, with customers such as the FBI, CIA and NASA, financial services e.g. ABN Amro, and manufacturing e.g. Boeing, Schlumberger. There are now enterprise 500 customers in all.

The recent acquisition of arch-competitor FAST by Microsoft demonstrates how this market is increasingly recognised as key by the industry giants. While there are plenty of competitors out there the only others in the current Gartner Leaders quadrant for this market are FAST, IBM (with Omnifind) and Autonomy, which is much more established in unstructured enterprise search. Endeca has set an impressive pace of growth, and it seems to me that there are plenty of situations in other verticals e.g. healthcare, that could suit its technology.

Finding reports, naturally

January 3, 2008

Another example of innovation in the seemingly mature world of BI can be found lurking within the unlikely setting of Progress Software (Progress acquired EasyAsk in May 2005). EasyAsk is a product which combines search capability with a natural language interface than can generate SQL to run against data warehouses. This unusual combination has led it to be used in many eCommerce sites, allowing for natural language inquiries to be translated into product offerings from web sites.

However the technology is a natural (excuse the pun) fit for a rather understated but very real problem in large organisations: actually finding existing reports or pieces of analysis. Most large companies have invested in licences of Cognos, Business Objects or other reporting and analysis software, but what happens after the initial project set-up has happened? The implementation consultants typically set up some pre-configured environments (e.g. a Business Objects universe) and perhaps a little training, and end user analysts then supposedly have at the data warehouse with glee. In reality most end users have no desire to learn a tool beyond Excel, so most rely on pre-built reports e.g. monthly sales figures, being set up for them by the IT department. A subset of end-users, typically people with “analyst” somewhere in their job title, are happy to do “ad hoc reporting”, though to be honest most of these characters could make do with a command line SQL interface rather than a fancy reporting tool if push came to shove.

The big issue is one of wasted effort due to lack of re-use. If one analyst spends a few hours coming up with a new take on sales profitability, surely this would be useful for others? Yet generally if a request comes down to produce a report, people start from scratch even if there are already perfectly good reports already produced by someone else in the company. They just do not know they are there.

This is where tools with strong search capability can help. Certainly this is not new, and Autonomy, FAST, Endeca etc can be helpful in tracking down existing information. Yet such tools are really designed for unstructured data rather than structured data. EasyAsk has the advantage that it provides end-users with the ability to do natural language queries if they don’t quite find what they need. The leading BI players have begun to realise how much of an issue this is in recent years e.g. Business Objects purchase of Inxigt. However there is plenty of room for a pure-play alternative, as this is a problem that is barely addressed in most large companies.

One complication that EasyAsk will encounter is a natural hostility in IT departments to natural language interfaces, since hoary DBA types (I started as a DBA, so can say this kind of thing) are never going to trust that a generated piece of SQL from a question like “find me the most profitable sales region” is going to get the right answer. EasyAsk addresses this concern somewhat by having subject dictionaries that are compiled with a domain expert (e.g. in HR this might equate the phrases “laid off” to “let go” to “fired” to “terminated”) in order to give its technology a better chance of formulating the right answer, and of course you can always switch on a trace to see the SQL generated to see what is going on and get it looked over by an IT type. However if a DBA has to check the SQL generated every time before approving a new report then this rather defeats the object of the exercise in the first place.

For this reason EasyAsk probably need to target end-users rather than IT departments, who will probably always be a tough crowd for them. If they can get to the right audience, then addressing the problem of making better use of all those pre-existing canned reports is a very real problem to which a large dollar value can be attached. They seem to have made an impression with customers like GSK, Forbes and BASF, and their technology is already embedded within several other companies’ applications. I recall from my days at Shell that this is a widespread issue in large companies, so exploiting existing BI investment should be a happy hunting ground for companies with the right value proposition.

Orchestrating MDM Workflow

December 28, 2007

France is rarely associated with enterprise software innovation (test: name a French software company other than Business Objects) but in MDM there are two interesting vendors. I have already written about Amalto, but the more established French MDM player is Orchestra Networks. Founded in 2000, this company has been selling its wares in the French market since 2003, and has built up some solid customer references, mainly in the financial services arena but also with global names such as Sanofi Aventis and Kraft.

The great strength of their EBX technology is the elaborate support for complex business process workflow, an area neglected by most MDM vendors. For example a customer may have an international product code hierarchy, and distribute this to several regions. Each of the regional branches may make local amendments to this, so what happens when a new version of the international hierarchy is produced? EBX provides functionality to detect differences between versions or branches and to allow for merging of these versions, supporting both draft “project” master data and the production versions, keeping track of all changes and supporting the workflow rules to support the full life-cycle of master data creation and update.

Typically such functionality is delivered with only by PIM vendors (Kalido is an exception), yet EBX is fully multi-domain by design, so is not restricted to any one class of master data. This will give it an advantage in competitive situations with vendors who have historically designed their technology around one type of master data (customer or product) and are only now realising the need to support multiple domains.

So far Orchestra Networks has confined itself to France, but opens its first overseas office in London soon. The company has taken the time to build out its technology to a solid level of maturity, and has productive partnerships with Informatica (for data quality and ETL) and Software AG, who OEM EBX and sell it globally at the heart of their own MDM offering.

In my own experience of MDM projects, the handling of the business processes around creating and updating master data is a key issue, yet most hub vendors have virtually ignored it, assuming this is somehow “out of scope”. Hub vendors typically focus on system to system communication e.g. validating a new customer code by checking a repository, and perhaps suggesting possible matches if a similar name is found. This is technically demanding as it is near real-time. However human to system interaction is also important, especially outside the customer domain, where business processes can be much more complex. By providing sophisticated support for this workflow Orchestra Networks can venture into situations where CDI vendors cannot easily go, and as I have written previously there are plenty of real business problems in MDM beyond customer.

It will be interesting to see how Orchestra Networks fares as it ventures outside of France in 2008.

The Gaul of it

December 18, 2007

I came across an interesting new MDM vendor recently called Amalto, a start-up from Paris (though they already have a California office). They have only been selling their software for less than a year, but already have a good set of early customers, such as Rio Tinto, Total, SNCF and BNP Paribas. Their Xtentis product offers a generic MDM repository with data movement (EAI like) functionality, and they make heavy use of standards (Eclipse, Ajax etc). Unusually, they use an XML database rather than a relational database as their underlying storage mechanism. Given the relatively low data volumes typical in MDM applications, this approach seems interesting, since XML databases are strong at handling data with complex structures (e.g. variable depth hierarchies) that one often encounters in master data. In case you think XML databases are unproven, Berkeley DB is probably the most widely deployed DBMS in the world, being embedded in many mobile phones, for example, and most phone users don’t have deep DBA skills. On a parochial note, it is nice to see a European software company emerging for a change (another MDM vendor is Orchestra Networks, also French).

Though an early stage company, Amalto is making good progress in the French market and in 2008 will start to expand to the USA. If they can firm up their positioning (confusingly, they also have a product for B2B exchanges, a quite different market, resold by Ariba) and develop good systems integration partnerships in the US then they should be an interesting addition to the MDM space. Their technology is innovative and their early customer stories sound promising.

Posing questions

December 12, 2007

The recent spate of acquisitions in the BI world (Cognos by IBM, Business Objects by SAP) might cause you to assume that the area was becoming mature (for which read: nothing much new to do). However there is still innovation going on. A company called Tableau, formed mainly by some ex-Stanford University people (including one who was an early employee at Pixar and who has two Oscars to his name!) has neatly combined BI software with clever use of visualisation technology. I have written before how visualisation has struggled to break out of a small niche, though there are certainly some clever technologies out there (e.g. Fractal Edge). One thing that Tableau has done well is to make a very well thought out demo of their software. Product demos are often dull affairs, but this one is very engaging (if a little frenetic), with some real thought put into the underlying data in order to show off the tool to good effect.

I still firmly believe that only a limited proportion of end users actually need a sophisticated analysis tool of any kind. In my experience of BI projects, end users generally find the leading BI tools a lot less intuitive than the vendors would like to think they are, often resorting to Excel once they have found the data they need. The type of technology that Tableau is developing provides an interesting alternative to the established players and has the potential to engage a certain subset of users more. I will follow their progress with interest.

Appliances on demand

November 27, 2007

It is interesting to see Kognitio launching a data warehouse on demand service. Traditionally data warehouses are built in-house, partly because they are mostly “built” rather than bought even today, and partly because of the data-intensive nature of them, by definition involving links to multiple in-house systems. However there is no real reason why the service cannot be provided remotely. In my days at Shell my team used to provide a similar internal service to small business units who did not want to build up in-house capability. We implemented a warehouse, built the interfaces and then managed the operational service. Kognitio is well placed to provide such a service because they have good data migration experience, and they conveniently have a powerful warehouse appliance, which is much more mature than many others, even if it has been, until recently, not very successfully marketed. Hence this seems an astute move to me.

I would not expect this to be the last such offering. Given some clear advantages that software as a service brings to customers (less installed software footprint, typically a smoother pricing model) it will be interesting to see whether these advantages outweigh the fear in customer minds about allowing their key data outside the firewall.