Andy on Enterprise Software

Kalido changes hands

October 17, 2013

Yesterday Kalido, the data warehouse and MDM company, changed owners. Rather than an acquisition by a software company, the buyer was an investment company called Silverback, a Texas company backed by a VC called Austin Ventures. The company specialises in purchasing software companies in related groups, building the businesses into something greater than the original parts. It has recently done this with a series of project management-related acquisitions in the form of Upland Software. In this context, presumably Kalido will be combined with Noetix, an analytics company in their portfolio, perhaps with something else to follow. At first glance, the synergy here looks limited, but we shall see. It would make sense if acquisitions in the areas of data quality and perhaps data integration followed, allowing a broader platform-based message around master data.

As someone with a personal interest in the company (I founded it, but left in 2006 when it moved its management to the USA) it is a little sad to see Kalido not achieve greater things than it has in the market, at least up until now. It was perhaps a bit ahead of its time, and had technology features back in 1996 that are only now appearing in (some) current competitors: time variance and the management of federations of hub instances being key examples. The marketing messaging and sales execution never matched the technology, though the company has nevertheless built up an impressive portfolio of global customers, which remain a considerable asset. Hopefully the new backers will invigorate the company, though to do this a key indicator will be whether they manage to lock in and motivate key technical staff. If this happens, and genuinely synergistic acquisitions follow, then perhaps the company’s technology will gain the wider audience that it deserves.

Teradata Goes Shopping

March 4, 2011

The recent flurry of acquisition activity in the data warehouse appliance space continued today as Teradata purchased Aster Data. HP’s purchase of Vertica, IBM’s of Netezza, EMC of Greenplum and (less recently) Microsoft of Data Allegro underscore the fact that demand for high performance analytic databases is perceived to be strong by the industry giants. At first glance this may seem an odd buy for Teradata, itself the original appliance vendor, but Aster in fact occupied a very particular niche in the market.

Aster’s strengths (and its intellectual propertty, patent pending) were around its support for intergated MapReduce analytics. MapReduce is the distributed computing framework pioneeed by Google, which inspired the open-source framework Hadoop. This framework is suited to highly compute-intensive analytics, particularly of high volumes of unstructured data. This includes use cases like fraud analysis, but has found a particular niche in social networking websites, who have to deal with vast and rapdily increasing volumes of data. Certain analytic queries such as social network graph analysis, signal analysis, network analysis and some time series analysis are awkward for conventional SQL, involving self-joins and potentially multiple passes through a database, which is a big deal if the database is hundreds of terabytes in size. The MapReduce appoach can offer significant perfomance advantages for such use cases, though it typically requires specialist programming knowledge.

Aster’s customers included companies like LinkedIn and FullTilt Poker,and its SQL-MR technology had a good reputation in such situations. Aster was a relatively small company, so this purchase is loose change for Teradata but buys it a jump-start into this fashionable area of analytic processing. Aster of course gains access to the channels and deep pockets of Teradata. Conservative buyers may have been unwilling to jump into these waters with a start-up but will be reassured by the legitimisation of the technology by a big software brand. Hence it seems like a win-win for both companies.

This leaves very few stand-alone independent data warehouse vendors: ParAccel, Kognitio and more obscure players like Exasol and Calpont can continue to plough an independent path, but I suspect that this will not be the last acquisition we will see in this market.

HP has a Vertica strategy

February 14, 2011

Having recently abandoned its Neoview offering, HP today revealed its plans in the data warehouse market by purchasing Vertica. Vertica is one of a clutch of data warehouse vendors that has apeared in recent years, employing MPP architecture and usually a specialist database structure in order to achieve fast analytic performance on large volumes of data. In Vertica’s case it uses a columnar database (of the style pioneered by Sybase), but in this case combined with MPP. This combination works well for many analytic use cases, and a well executed sales strategy based around this has meant that Vertica has achieved consderable market momentum compared to many of its competitors, building up a solid roster of customers such as Comcast, Verizon and Twitter.

In principle HP’s vast sales channel should be very beneficial to spreading the Vertica technology further. Nervous buyers need no longer be anxious about buying from a start-up, and HP clearly has a vast marketing channel. Yet success is far from guaranteed, as HP’s previous debacle with its Neoview data warehouse offering showed. Now at least HP has a proven modern data warehouse offering with traction in the market. It remains to be seen whether it can exploit this advantage.

Appliances and ETL

November 19, 2010

I attended some interesting customer sessions at the Netezza user group in London yesterday, following some other good customer case studies at the Teradata conference in the rather sunnier climes of San Diego. Once common thread that came out from some sessions was the way that the use of appliances changes the way in which companies treat ETL processing. Traditionally a lot of work has gone into taking the various source systems for the warehouse. defining rules as to how this data into be converted into a common format, then using an ETL tool (like Informatica or Ab Initio etc) to carry out this pre-processing before presenting a neatly formatted file in consistent form to be loaded into a warehouse.

When you have many terabytes of data then this pre-processing in itself can become a bottleneck. Several of the customers I listened to at these conferences had found it more efficient to move from ETL to ELT. In other words they load essentially raw source data (possibly with some data quality checking only) into a staging area in the warehouse appliance, and then write SQL to carry out the transformations within the appliance before loading up into production warehouse tables. This allows them to take advantage of the power of the MPP boxes they have purchase for the warehouse, which are typically more efficient and powerful than using regular servers that their ETL tools run on. This does not usually eliminate the need for the ETL tool (though one customer did explain how they had switched off some ETL licences) but means that much more processing is carried out in the data warehouse itself.

Back in my Kalido days we found it useful to take this ELT approach too, but for different reasons. It was cleaner to do the transformations based on business rules stored in the Kalido business model, rather than having the transformations buried away in ETL scripts, meaning more transparent rules and so lower support effort. However I had not appreciated that the sheer horsepower available in data warehouse appliances suits ELT for pure performance reasons. Have others found the same experience on their projects? If so then post a comment here.

Low Hanging Fruit

July 8, 2010

EMC has entered the data warehouse appliance market via the purchase of Greenplum, who made good progress in the space with their massively parallel database (based on PostgresSQL). Greenplum had impressive reference customers and some of the highest end references out there in terms of sheer scale of data. They had also been one of the few vendors (Aster is another) that went early into embracing MapReduce, a framework designed for paralleism and suitable for certain types of complex processing.

Data warehouse appliances can be a tough sell because conservative buyers are nevous of making major purchases from saller comapnies, but the EMC brand will remove that concern. Also, EMC have a vast existing customer base and the sales channel that can exploit this. Seems like a sensible move to me.

No Data Utopia

August 11, 2009

The data warehouse appliance market has become very crowded in the last couple of years, in the wake of the success of Netezza, which has drawn in plenty of venture money to new entrants. The awkwardly named Dataupia had been struggling for some time, with large-scale redundancies early in 2009, but now appears to have pretty much given up the ghost, with its assets being put up for sale by the investors.

If nothing else, this demonstrates that you need to have a clearly differentiated position in such a crowded market, and clearly in this case the sales and marketing execution could not match the promise of the technology. However it would be a mistake to thing that all is doom and gloom for appliance vendors, as the continuing recent commercial success of Vertica demonstrates.

To me, something that vendors should focus on is how to simplify migration off an existing relational platform. If you have an under-performing or costly data warehouse, then an appliance (which implies “plug and play”) sound appealing. However although appliance vendors support standard SQL, it is another thing to try and migrate a real-life database application, which may have masses of proprietary application logic locked up in stored procedures, triggers and the like. This would seem to me the thing that is likely to hold back buyers, but many vendors seem to focus entirely on their price/performance characteristics in their messaging. It actually does not matter if a new appliance has 10 times better price performance (let’s say, saving you half a million dollars a year) if it takes several times that to actually migrate the application. Of course there are always green-field applications, but if someone could devise a way of dramatically easing migration effort from an existing relational platform then it seems to me that they would have cracked the code on how to sell to end-users in large numbers. Ironically, this was just the kind of claim that Dataupia made, which suggests that there was a gap between its claims and its ability to convince the market that it was really that easy, despite accumulating a number of named customer testimonials on its web-site.

Even having the founder of Netezza (Foster Hinshaw) did not translate into commercial viability, despite the company attracting plenty of venture capital money. The company has no shortage of marketing collateral; indeed a number of industry experts who have authored glowing white-papers on the Dataupia website may be feeling a little sheepish right now. Sales execution appears to have been a tougher nut to crack. I never saw the technology in action, but history tells us that plenty of good technology can fail in the market (proud owners of Betamax video recorders can testify to that).

If anyone knows more about the inside story here then feel free to contact me privately or post a comment.

Clearing a migration path

March 20, 2008

One of the issues often underestimated by new vendors attacking an entrenched competitor is the sheer cost of platform migration. For example, in the database world, if someone comes out with a new, shiny DBMS that is faster and cheaper than the incumbents, why would customers not just switch? After all the new database is ANSI compliant and so is the old one, right? Of course this view may look good in a glossy article in a magazine or the fevered fantasies of a software sales person, but in reality enterprises have considerable switching costs for installed technology. In the case of databases, SQL is just a small part of the story. There are all the proprietary database extensions (stored procedures, triggers etc), all the data definition language scripts, systems tables with assorted business rules implicitly encoded, and he invested time and experience of the database administrators, a naturally conservative bunch. I know as I was a DBA a long time ago; there is nothing like the prospect of being phoned up in the middle of the night to be told the payroll database is down and asked how many minutes will it take you bring it back up, to give you a sceptical perspective on life. New and exciting technology is all well and good, but if that involves rewriting a large suite of production batch jobs that you have just spent months getting settled, you tend to just push that brochure back across the table of the software sales person. Installed enterprise software is notoriously “sticky”.

Hence when attacking something that is already in production, you have to go further than just say “ours is x times cheaper and faster”. An example of this is with DATAllegro and their assault on the mountainous summit that is the Teradata installed base. They have just, very sensibly, brought out a suite of tools that will actually help convert and existing Teradata account, rather than just hoping someone is going to buy into the speed and cost story. This new suite of utilities will:

- convert BTEQ production jobs with the DATAllegro DASQL batch client
- convert DDL from Teradata to DATAllegro DDL
- connect to the Teradata environment and extract table structures (schema) and data and import them into DATAllegro.

This is right approach to take. What they need to do next is get some public customer stories who have actually been through this conversion, and get them to talk about the benefits, and also realistically about the effort involved. If they can do that then they will be in a credible position to start eating away at the Teradata crown jewels, the seriously high end databases with 100TB or more.

A Sideways Glance

February 19, 2008

Vertica is one of the plethora of vendors which have emerged in the analytics “fast database” space pioneered by Teradata and more recently opened up by Netezza. The various vendors take different approaches. Some (e.g. Netezza) have proprietary hardware, some (e.g. Kognitio, Dataupia) are software only, some (e.g. ParAccel) rely mainly on in-memory techniques, others simply use different designs from the traditional designs of the mainstream DBMS vendors (Oracle, DB2).

Vertica (whose CTO is Mike Stonebraker of Ingres and Postgres fame) is in the latter camp. Like Sybase IQ (and Sand) it uses a column-oriented design (i.e., it groups data together by column on disk) rather than the usual row-oriented storage used by Oracle and the like. This approach has a number of advantages for query performance. It reduces disk I/O by only having to read the columns referenced by the query and also by aggressively compressing data within columns. Through use of parallelism across clusters of shared-nothing computers, Vertica databases can scale easily and affordably by adding additional servers to the cluster. Normally the drawback to column-oriented approaches is their relatively slow data load times, but Vertica has some tricks up its sleeve (a mix of in-memory processing which trickle feeds disk updating) which it claims allow load times comparable to, and sometimes better than, row-oriented databases. Vertica comes with an automated design feature that allows DBAs to provide it with the logical schema, plus training data and queries, which it then uses to come up with a physical structure that organizes, compresses and partitions data across the cluster to best match the workload (though ever-wary DBAs can always override this if they think they are smarter). With a standard SQL interface Vertica can work with existing ETL and business intelligence tools such as Business Objects, and has significantly expanded the list of supported vendors in their upcoming 2.0 release.

With so many competing vendors all claiming tens of times better performance than others, the measure that perhaps matters most is not a lab benchmark but customer take-up. Vertica now has 30 customers such as Comcast, BlueCrest Capital Management, NetworkIP, Sonian Networks and LogiXML, and with its upcoming 2.0 release out on 19/2/2008 is doing joint roadshows with some of these. It has done well in Telcos, who have huge data volumes in their call detail records databases. Two deployed Vertica customers have databases approaching 40 TB in size. Another area is financial services, where hedge funds want to backtest their trading algorithms against historical market data. With one year worth of US financial markets data taking up over 2TB, this can quickly add up, and so Vertica has proved popular amongst this community, as well as with marketing companies with large volumes of consumer data to trawl trough. Vertica runs on standard Linux servers, and it has a partnership with HP and Red Hat to provide a pre-bundled appliance, which is available from select HP resellers.

With solid VC backing, a glittering advisory board (Jerry Held, Ray Lane, Don Hadrele,…) and genuine customer traction in an industry long on technology but short on deployed customers, Vertica should be on every vendor short-list for companies with heavy duty analytical requirements which currently stretch performance limits and budgets

A Lively Data Warehouse Appliance

February 15, 2008

DATAllegro was one of the earlier companies to market (2003) in the recent stampede of what I call ”fast databases”, which covers appliances and other approaches to speedy analytics (such as in-memory databases or column-oriented databases). Initially DATAllegro had its own hardware stack (like Netezza) but now uses a more open combination of storage from EMC and Dell Servers (with Cisco InfiniBand Interconnect). It runs on the well proven Ingres database, which has the advantage of being more “tuneable” than some other open databases like MySQL.

The database technology used means that plugging in business intelligence tools is easy, and the product is certified for the major BI tools such as Cognos and Business Objects, and recently Microstrategy. It can also work with Informatica and Ascential Datastage (now IBM) for ETL. Each fast database vendor has its own angle on why its technology is the best, but there are a couple of differentiators that DATAllegro has. One is that it does well in situations of mixed workloads, where as well as queries there are concurrent loads and even updates happening to the database. Another is its new “grid” technology, which allows customers to deal with the age-old compromise of centralised warehouse v decentralised data marts. Centralised is simplest to maintain but creates a bottleneck and creates scale challenges. However de-centralised marts quickly become un-co-ordinated and can lead to lack of business confidence in the data. The DATAllegro grid utilises node-to-node hardware transfer to allow dependent copies of data marts to be maintained from a central data warehouse. With transfer speeds of up to 1 TB a minute (!) claimed, such a deployment allows companies to have their cake and eat it. This technology is in use at one early customer site, and is just being released.

DATAllegro has set its sights firmly at the very high end of data volumes, those encountered by retailers and telcos. One large customer apparently has a live 470 TB database implementation, though since the company is very coy about naming its customers I cannot validate this. Still, this is enough data to give most DBAs sleepless nights, so it is fair to say that this is at the rarefied end of the data volume spectrum. This is territory firmly occupied by Teradata and Netezza (and to a lesser extent Greenplum). The company is tight-lipped about numbers of customers (and I can find only one named customer on its website), revenues and profitability, making it hard to know what market momentum is being achieved. However its technology seems to me to be based on solid foundations and has a large installed base of Teradata customers to attack. Interestingly, Oracle customers can be a harder sell, not because of the technology but because of the weight of stored procedures and triggers that customers have in Oracle’s proprietary extension to the SQL standard, making porting a major issue.

If only DATAllegro can encourage more customers to become public then it will be able to raise its profile further and avoid being painted as a niche vendor. Being secretive over customer and revenue numbers seems to me self-defeating, as it allows competitors to spread fear, uncertainty and doubt: sunlight is the best disinfectant, as Louis Brandeis so wisely said.

Peeking at Models

February 7, 2008

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here: http://www.kalido.com/resources-multimedia-center.htm

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.