A Sideways Glance

Vertica is one of the plethora of vendors which have emerged in the analytics “fast database” space pioneered by Teradata and more recently opened up by Netezza. The various vendors take different approaches. Some (e.g. Netezza) have proprietary hardware, some (e.g. Kognitio, Dataupia) are software only, some (e.g. ParAccel) rely mainly on in-memory techniques, others simply use different designs from the traditional designs of the mainstream DBMS vendors (Oracle, DB2).

Vertica (whose CTO is Mike Stonebraker of Ingres and Postgres fame) is in the latter camp. Like Sybase IQ (and Sand) it uses a column-oriented design (i.e., it groups data together by column on disk) rather than the usual row-oriented storage used by Oracle and the like. This approach has a number of advantages for query performance. It reduces disk I/O by only having to read the columns referenced by the query and also by aggressively compressing data within columns. Through use of parallelism across clusters of shared-nothing computers, Vertica databases can scale easily and affordably by adding additional servers to the cluster. Normally the drawback to column-oriented approaches is their relatively slow data load times, but Vertica has some tricks up its sleeve (a mix of in-memory processing which trickle feeds disk updating) which it claims allow load times comparable to, and sometimes better than, row-oriented databases. Vertica comes with an automated design feature that allows DBAs to provide it with the logical schema, plus training data and queries, which it then uses to come up with a physical structure that organizes, compresses and partitions data across the cluster to best match the workload (though ever-wary DBAs can always override this if they think they are smarter). With a standard SQL interface Vertica can work with existing ETL and business intelligence tools such as Business Objects, and has significantly expanded the list of supported vendors in their upcoming 2.0 release.

With so many competing vendors all claiming tens of times better performance than others, the measure that perhaps matters most is not a lab benchmark but customer take-up. Vertica now has 30 customers such as Comcast, BlueCrest Capital Management, NetworkIP, Sonian Networks and LogiXML, and with its upcoming 2.0 release out on 19/2/2008 is doing joint roadshows with some of these. It has done well in Telcos, who have huge data volumes in their call detail records databases. Two deployed Vertica customers have databases approaching 40 TB in size. Another area is financial services, where hedge funds want to backtest their trading algorithms against historical market data. With one year worth of US financial markets data taking up over 2TB, this can quickly add up, and so Vertica has proved popular amongst this community, as well as with marketing companies with large volumes of consumer data to trawl trough. Vertica runs on standard Linux servers, and it has a partnership with HP and Red Hat to provide a pre-bundled appliance, which is available from select HP resellers.

With solid VC backing, a glittering advisory board (Jerry Held, Ray Lane, Don Hadrele,…) and genuine customer traction in an industry long on technology but short on deployed customers, Vertica should be on every vendor short-list for companies with heavy duty analytical requirements which currently stretch performance limits and budgets

A Lively Data Warehouse Appliance

DATAllegro was one of the earlier companies to market (2003) in the recent stampede of what I call ”fast databases”, which covers appliances and other approaches to speedy analytics (such as in-memory databases or column-oriented databases). Initially DATAllegro had its own hardware stack (like Netezza) but now uses a more open combination of storage from EMC and Dell Servers (with Cisco InfiniBand Interconnect). It runs on the well proven Ingres database, which has the advantage of being more “tuneable” than some other open databases like MySQL.

The database technology used means that plugging in business intelligence tools is easy, and the product is certified for the major BI tools such as Cognos and Business Objects, and recently Microstrategy. It can also work with Informatica and Ascential Datastage (now IBM) for ETL. Each fast database vendor has its own angle on why its technology is the best, but there are a couple of differentiators that DATAllegro has. One is that it does well in situations of mixed workloads, where as well as queries there are concurrent loads and even updates happening to the database. Another is its new “grid” technology, which allows customers to deal with the age-old compromise of centralised warehouse v decentralised data marts. Centralised is simplest to maintain but creates a bottleneck and creates scale challenges. However de-centralised marts quickly become un-co-ordinated and can lead to lack of business confidence in the data. The DATAllegro grid utilises node-to-node hardware transfer to allow dependent copies of data marts to be maintained from a central data warehouse. With transfer speeds of up to 1 TB a minute (!) claimed, such a deployment allows companies to have their cake and eat it. This technology is in use at one early customer site, and is just being released.

DATAllegro has set its sights firmly at the very high end of data volumes, those encountered by retailers and telcos. One large customer apparently has a live 470 TB database implementation, though since the company is very coy about naming its customers I cannot validate this. Still, this is enough data to give most DBAs sleepless nights, so it is fair to say that this is at the rarefied end of the data volume spectrum. This is territory firmly occupied by Teradata and Netezza (and to a lesser extent Greenplum). The company is tight-lipped about numbers of customers (and I can find only one named customer on its website), revenues and profitability, making it hard to know what market momentum is being achieved. However its technology seems to me to be based on solid foundations and has a large installed base of Teradata customers to attack. Interestingly, Oracle customers can be a harder sell, not because of the technology but because of the weight of stored procedures and triggers that customers have in Oracle’s proprietary extension to the SQL standard, making porting a major issue.

If only DATAllegro can encourage more customers to become public then it will be able to raise its profile further and avoid being painted as a niche vendor. Being secretive over customer and revenue numbers seems to me self-defeating, as it allows competitors to spread fear, uncertainty and doubt: sunlight is the best disinfectant, as Louis Brandeis so wisely said.

Peeking at Models

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here: http://www.kalido.com/resources-multimedia-center.htm

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.

Informatica prospers

Informatica shrugged off the US financial services troubles with a strong quarter. Revenue was USD 114M, up 24% year on year, operating profit of USD 25M up 47%. Licence revenue was up 28%, with ten deals over USD 1 million in size.

Informatica continues to prosper as the leading independent ETL tool. IBM’s acquisition of Ascential is rumoured not to have been one of its happier ones, and certainly it seems to have done Informatica no harm at all.

The Brits are coming

Not the Oscars this time, but a data warehouse appliance. Teradata carved out a successful high end niche in database and hardware technology specifically aimed at analytic rather than transactional processing, succeeding where previous attempts (e.g. Red Brick, Britton Lee) had faltered. However it is the rapid rise of Netezza that caused a flurry of look-alike appliance vendors to sprout up in the last couple of years such as DatAllegro, Datupia, ParAccel etc. I believe that it will be much easier to convince conservative buyers about appliances if they do not come with proprietary hardware, and indeed this is the approach taken by Datupia. However the software-only appliance route was taken a couple of years earlier by Kognitio (a re-brand of Whitecross). Kognito initially had a proprietary hardware link and had built up some impressive references in the UK such as BT (who have serious data volumes) but had not succeeded as broadly commercially as they might have done; in my view they were held back by the proprietary hardware issue (especially in a conservative UK market). This has been addressed, and a major re-engineering exercise has now allowed their WX2 V6 product to run on commodity X86 hardware such as data blades.

WX2 uses scanning technology, no indexes, and is an RDBMS using hardware parallelism and smart use of memory in preference to disk access where possible to achieve its performance. The product reads in data from a flat file, loads it quickly (1 terabyte an hour) and can then achieve extremely fast read performance. In one test 23 billion rows were read in two seconds. This approach differs from column-oriented databases (e.g. Sybase, ParAccel) whose design can also achieve high performance for certain analytic queries but are inherently less flexible. A typical Kognitio implementation may involve 80 servers in groups of four. Resilience is obviously a key issue for such large data volumes, and the company claims that if you pull a server out of the rack and so artificially crash the system, it is able to restart in a just a few minutes.

The technology does not compete with data quality tools, as it assumes that pre-validation of data has been completed prior to loading. It could be characterised in philosophy as ELT (rather than ETL) since with such fast performance at its disposal it may be more efficient to carry out transformations within the database engine than pre-processing prior to loading. An ODBC interface allows the loaded data to be queries by any normal reporting tool. Against conventional databases such as Oracle, appliances can show dramatic results. In one recent proof concept on a half terabyte sample database, some queries were demonstrated to be 40 times faster than the existing warehouse.

Kognitio already has nearly half its customers on its software as a service model, which I wrote about previously. The more traditional licences result in orders typically in the GBP 300k – 1.2M range. The company has added more solid customer references such as Marks and Spencer and Scottish Power (it has a few dozen customers now), and has grown to 78 employees and around GBP 8 million in revenue, having been profitable for three years. This solid commercial performance has now given it the base to branch out into the massive US market, and it is about to open a head office in Chicago with sales offices in Boston and San Francisco.

Kognitio has the advantage of non-proprietary hardware ties (unlike Netezza) and a solid and lengthy track record of successful reference customers (unlike more recent appliance start-ups), which should be a potent combination if it can crack sales and marketing to the US market.

Appliances on demand

It is interesting to see Kognitio launching a data warehouse on demand service. Traditionally data warehouses are built in-house, partly because they are mostly “built” rather than bought even today, and partly because of the data-intensive nature of them, by definition involving links to multiple in-house systems. However there is no real reason why the service cannot be provided remotely. In my days at Shell my team used to provide a similar internal service to small business units who did not want to build up in-house capability. We implemented a warehouse, built the interfaces and then managed the operational service. Kognitio is well placed to provide such a service because they have good data migration experience, and they conveniently have a powerful warehouse appliance, which is much more mature than many others, even if it has been, until recently, not very successfully marketed. Hence this seems an astute move to me.

I would not expect this to be the last such offering. Given some clear advantages that software as a service brings to customers (less installed software footprint, typically a smoother pricing model) it will be interesting to see whether these advantages outweigh the fear in customer minds about allowing their key data outside the firewall.

Self Raising Appliances

Dataupia has an odd name (presumably hinting at data utopia) but a very interesting idea. The technology was neatly summarised by Phil Howard so I won’t repeat the details. The key is that it promises something that sounds too good to be true: an appliance that runs on existing databases (Oracle, SQL Server etc) essentially removing the execution of queries and data management form the DBMS, and running queries on a massively parallel processing architecture using commodity hardware. Coming from one of the founders of Netezza, it has inherent credibility, and I am looking forward to hearing some production customer case studies to validate whether it is really as good as it claims. If it does something close to what it claims to do then it could have a great market, since it removes the key barrier that limits the market of data warehouse appliances like Netezza (and indeed Teradata, the uber “appliance”), which is the proprietary nature of their software. This makes buyers nervous and at the very least means a significant conversion effort for an existing application. But if you can really just plug in the Dataupia appliance without modifying any SQL, and just watch the queries run faster, then it will appeal to a whole range of creaking data warehouse applications that Netezza et al have yet to convince. Given that most data warehouses are smaller than you might think, there is a large market out there Dataupia can address which will never be appropriate for Netezza and the like. It also has partner potential due to its non-invasive nature e.g. Kalido and Dataupia already have a relationship, and there are already early OEM deals on show.

The venture world obviously buys the story, as in a fund-raising environment where enterprise technology companies are as out of fashion as corduroy, Dataupia has secured a USD 16 million B round. This is no mean achievement in itself these days. To me this is definitely a company to watch.

The surprisingly fertile world of database innovation

I came across a thought-provoking article, an interview with Michael Stonebraker. As the inventor of Ingres this is someone who knows a thing or two about databases, and I thought that some interesting points were raised. He essentially argues that advances in hardware have meant that specialist databases can out-perform the traditional ones in a series of particular situations, and that these situations are in themselves substantial markets that start-up database companies could attack. He singles out text, where relational databases have never prospered, fast streaming data feeds of the type seen on Wall Street, data warehouses and specialist OLTP. With Streambase he clearly has some first-hand experience of streaming data, and OLTP is what he is working on right now.

I must admit that with my background in enterprise architecture at Shell I underestimated how much of a market there has been for specialist databases, assuming that the innate conservatism of corporate buyers would make it very hard for specialsit database vendors. Initially I was proved right, with attempts like Red Brick flickering but quickly becoming subsumed, while object databases were clearly not going to take off. With such false starts it was easy to extrapolate and assume that the relational vendors would simply win out and leave no room for innovation. However to take the area of data warehousing, this has clearly not been the case. Teradata blazed the trail of a proprietary database superior in data warehouse performance to Oracle etc, and now Netezza and a host of smaller start-ups are themselves snapping at Teradata’s heels. The in-memory crowd are also doing well, with for example Qliktech now being the fastest growing BI vendors by a long way, thanks to its in-memory database approach. Certainly Stonebraker is right about text – companies like Fast and their competitors would not dream of using relational databases to build their text search applications, an area where Oracle et al never really got it right at all.

Overall there seems to be a surprising amount of innovation in what at first glance looks like an area which is essentially mature, dominated by three big vendors: Oracle, IBM, Microsoft. Teradata has shown that you can build a billion dollar revenue company in the teeth of such entrenched competition, and the recent developments mentioned above suggest that this area is far from being done and dusted from an innovation viewpoint.

Informatica looks perky

Informatica announced an excellent set of quarterly results, demonstrating continuing rude health. Revenue of $94M was a spanking 17% up on the same time last year. License revenue was up 15% at $41M, so the improvement was more than just good services revenue. Eight deals over $1 million compared to nine last time, but deals over $300k were massively up with 35 compared to just 9 a year ago. There was also a major OEM deal, with SAP now going to OEM Informatica, a rare exception to their usual not invented here attitude. This is a good move for both parties.

The results were broad-based, with Informatica’s international operations doing particularly well. These results are a sign of continuing broad based good conditions n the broader BI market. When ETL prospers, data warehouses and BI tools are not far behind.

Netezza heads to market

The forthcoming Netezza IPO will be closely watched by those interested in the health of the technology space, and the business intelligence market in particular. Netezza has been a great success story in the data warehouse market. From being founded in 2000 its revenues have risen dramatically. Its fiscal year ends in January. Revenues have climbed from $13M in 2004 to around $30M in 2005 to £54M in 2006, to $79.6M in fiscal year ending January 2007. Its revenues in the quarter ending April 2007 were $25M. Hardly any BI vendors can claim this kind of growth rate (other than Qliktech), especially at this scale. Its customer base is nicely spread amongst industries and is not restricted to the obvious retail, telco and retail banking. So, is this the next great software (actually partly hardware in this case) success story?

Before you get too excited, there are some things to ponder. Note that in 2006 Netezza lost $8M despite that steepling revenue rise. In the latest quarter it still lost $1.6M. This is interesting, since conventional wisdom has it that you can only IPO these days with a few quarters of solid profits, yet Netezza has yet to make a dime. Certainly, it would be fair to assume that if it can keep growing at this rate, profit will surely come (at least its losses are shrinking) but the past has showed that profits can be elusive in fast growing software companies. Also, the data warehouse market is certainly healthy, advancing at 9% or so according to IDC projections, but this is well below Netezza’s growth rate. More particularly, Netezza only attacks one slice of the data warehouse market, the high data volume one. If you have a small data warehouse then you don’t need Netezza, so only certain industries will really be happy hunting grounds for appliances like Netezza. This can be seen in the Teradata story, which is Netezza’s true competitor. Teradata has stalled at around $1 billion or so of revenue, growing just 6% last year (of course most of us wish we had this kind of problem). Certainly Netezza can attack Teradata’s installed base, but enterprise buyers are notoriously conservative, and will have to be dragged kicking and screaming to shift platforms once operational. So this to me suggests that there is a ceiling to the appliance market. If true, this means that you cannot just draw an extrapolation of Netezza’s current superb revenue growth. I have not seen this written about elsewhere, so perhaps it is just a figment of my imagination, and Netezza will prove me wrong. However you can look to Teradata to see that even it has entirely failed to enter certain industires, typically business to business industries where data is complex rather than high in volume. Fo example there is scarely a Teradata installation in the oil industry, which fits this category of complex but mostly low volume data (except for certain upstream data).

So, bearing this in mind, what would be a valuation? Well, solid companies like Datamirror are changing hands for 3x revenue or so, though these are companies with merely steady growth rather than the turbo-charged growth demonstrated by Netezza. So suppose we skip the pesky profitability question, accept this is a premium company and went for five times revenues? This would lead to a valuation of $400M on trailing revenues, maybe $500M on this year’s likely revenues. Yet the offer price of the shares implies a market cap of $621M, virtually eight time trailing revenues, and six times likely forward revenues.

This is scarcely a bargain then, though it is a multiple that will bring joy to the faces of other BI vendors, assuming that the IPO goes well. Of course such things are generally carefully judged, and no doubt the silver tongued investment bankers have gauged that they can sell shares at this price. However for me there seems a nagging doubt, based mainly on what I perceive to be this (in my view) effective cap on the market size that appliances can tackle, and to a lesser extent that lack of proven ability to generate profits. The markets will decide.

The performance of Netezza shares will be a very interesting indicator of the capital market’s view on BI vendors, and will show whether enterprise technology is coming in from the cold winter that started in 2001. Anyway, many congratulations to Netezza, who have succeeded in carving out a real success story in the furrow that for so long was owned by Teradata.

Postscript. On the first day of trading, no one seems troubled about any long term concerns.