Andy on Enterprise Software

A Sideways Glance

February 19, 2008

Vertica is one of the plethora of vendors which have emerged in the analytics “fast database” space pioneered by Teradata and more recently opened up by Netezza. The various vendors take different approaches. Some (e.g. Netezza) have proprietary hardware, some (e.g. Kognitio, Dataupia) are software only, some (e.g. ParAccel) rely mainly on in-memory techniques, others simply use different designs from the traditional designs of the mainstream DBMS vendors (Oracle, DB2).

Vertica (whose CTO is Mike Stonebraker of Ingres and Postgres fame) is in the latter camp. Like Sybase IQ (and Sand) it uses a column-oriented design (i.e., it groups data together by column on disk) rather than the usual row-oriented storage used by Oracle and the like. This approach has a number of advantages for query performance. It reduces disk I/O by only having to read the columns referenced by the query and also by aggressively compressing data within columns. Through use of parallelism across clusters of shared-nothing computers, Vertica databases can scale easily and affordably by adding additional servers to the cluster. Normally the drawback to column-oriented approaches is their relatively slow data load times, but Vertica has some tricks up its sleeve (a mix of in-memory processing which trickle feeds disk updating) which it claims allow load times comparable to, and sometimes better than, row-oriented databases. Vertica comes with an automated design feature that allows DBAs to provide it with the logical schema, plus training data and queries, which it then uses to come up with a physical structure that organizes, compresses and partitions data across the cluster to best match the workload (though ever-wary DBAs can always override this if they think they are smarter). With a standard SQL interface Vertica can work with existing ETL and business intelligence tools such as Business Objects, and has significantly expanded the list of supported vendors in their upcoming 2.0 release.

With so many competing vendors all claiming tens of times better performance than others, the measure that perhaps matters most is not a lab benchmark but customer take-up. Vertica now has 30 customers such as Comcast, BlueCrest Capital Management, NetworkIP, Sonian Networks and LogiXML, and with its upcoming 2.0 release out on 19/2/2008 is doing joint roadshows with some of these. It has done well in Telcos, who have huge data volumes in their call detail records databases. Two deployed Vertica customers have databases approaching 40 TB in size. Another area is financial services, where hedge funds want to backtest their trading algorithms against historical market data. With one year worth of US financial markets data taking up over 2TB, this can quickly add up, and so Vertica has proved popular amongst this community, as well as with marketing companies with large volumes of consumer data to trawl trough. Vertica runs on standard Linux servers, and it has a partnership with HP and Red Hat to provide a pre-bundled appliance, which is available from select HP resellers.

With solid VC backing, a glittering advisory board (Jerry Held, Ray Lane, Don Hadrele,…) and genuine customer traction in an industry long on technology but short on deployed customers, Vertica should be on every vendor short-list for companies with heavy duty analytical requirements which currently stretch performance limits and budgets

The dark side of a marketing announcement

October 31, 2007

Business Objects announced a witches brew tie-up with Big Blue, bundling DB2 and its data warehouse capabilities with Business Objects together as a business intelligence “offering”. Given that Business Objects already works happily enough with DB2, it is rather unclear as to whether this is a any more a ghostly smoke and mirrors marketing tie-up rather than anything deeper, but it certainly makes some sense for both companies. It does, however, hint at Business Objects moving away from pure database platform independence, which takes on a new significance given the takeover (sorry: “merger” – much background cackling) of Business Objects by SAP. Is this really a subtle move to try and irritate Oracle, the other main DBMS vendor, give the highly competitive situation between SAP and Oracle, who are locked in a nightmare struggle for world domination? In this case, is SAP just manipulating Business Objects like a possessed puppet, pulling the strings behind the scenes, or was this just a hangover from the pre-takeover days, with the Business Objects marketing machine rolling on like a zombie that stumbles on yet does not realise it already has no independent life, clinging to some deep-held memory of those days in its old life. SAP has a more tense relationship with IBM itself these days. IBM sells cauldrons of consulting work around SAP implementations, but found a knife in its back when SAP started promoting the Netweaver middleware in direct competition with IBM’s Web Sphere.

Announcements from Business Objects from now on all need to be looked at through the distorting mirror of the relationship with its new parent, as there may be meaning lurking that would not have existed a month ago. Everything needs to be parsed for implications about the titanic Oracle v SAP struggle, as Business Objects should strive as far as possible to appear utterly neutral to applications and databases in order to not spook its customers. Arch rivals Cognos, Microstrategy and SAS will take advantage of any hint that the giant behind Business Objects is just pulling its strings.

Happy halloween everyone!

The surprisingly fertile world of database innovation

July 24, 2007

I came across a thought-provoking article, an interview with Michael Stonebraker. As the inventor of Ingres this is someone who knows a thing or two about databases, and I thought that some interesting points were raised. He essentially argues that advances in hardware have meant that specialist databases can out-perform the traditional ones in a series of particular situations, and that these situations are in themselves substantial markets that start-up database companies could attack. He singles out text, where relational databases have never prospered, fast streaming data feeds of the type seen on Wall Street, data warehouses and specialist OLTP. With Streambase he clearly has some first-hand experience of streaming data, and OLTP is what he is working on right now.

I must admit that with my background in enterprise architecture at Shell I underestimated how much of a market there has been for specialist databases, assuming that the innate conservatism of corporate buyers would make it very hard for specialsit database vendors. Initially I was proved right, with attempts like Red Brick flickering but quickly becoming subsumed, while object databases were clearly not going to take off. With such false starts it was easy to extrapolate and assume that the relational vendors would simply win out and leave no room for innovation. However to take the area of data warehousing, this has clearly not been the case. Teradata blazed the trail of a proprietary database superior in data warehouse performance to Oracle etc, and now Netezza and a host of smaller start-ups are themselves snapping at Teradata’s heels. The in-memory crowd are also doing well, with for example Qliktech now being the fastest growing BI vendors by a long way, thanks to its in-memory database approach. Certainly Stonebraker is right about text – companies like Fast and their competitors would not dream of using relational databases to build their text search applications, an area where Oracle et al never really got it right at all.

Overall there seems to be a surprising amount of innovation in what at first glance looks like an area which is essentially mature, dominated by three big vendors: Oracle, IBM, Microsoft. Teradata has shown that you can build a billion dollar revenue company in the teeth of such entrenched competition, and the recent developments mentioned above suggest that this area is far from being done and dusted from an innovation viewpoint.

Netezza heads to market

July 20, 2007

The forthcoming Netezza IPO will be closely watched by those interested in the health of the technology space, and the business intelligence market in particular. Netezza has been a great success story in the data warehouse market. From being founded in 2000 its revenues have risen dramatically. Its fiscal year ends in January. Revenues have climbed from $13M in 2004 to around $30M in 2005 to £54M in 2006, to $79.6M in fiscal year ending January 2007. Its revenues in the quarter ending April 2007 were $25M. Hardly any BI vendors can claim this kind of growth rate (other than Qliktech), especially at this scale. Its customer base is nicely spread amongst industries and is not restricted to the obvious retail, telco and retail banking. So, is this the next great software (actually partly hardware in this case) success story?

Before you get too excited, there are some things to ponder. Note that in 2006 Netezza lost $8M despite that steepling revenue rise. In the latest quarter it still lost $1.6M. This is interesting, since conventional wisdom has it that you can only IPO these days with a few quarters of solid profits, yet Netezza has yet to make a dime. Certainly, it would be fair to assume that if it can keep growing at this rate, profit will surely come (at least its losses are shrinking) but the past has showed that profits can be elusive in fast growing software companies. Also, the data warehouse market is certainly healthy, advancing at 9% or so according to IDC projections, but this is well below Netezza’s growth rate. More particularly, Netezza only attacks one slice of the data warehouse market, the high data volume one. If you have a small data warehouse then you don’t need Netezza, so only certain industries will really be happy hunting grounds for appliances like Netezza. This can be seen in the Teradata story, which is Netezza’s true competitor. Teradata has stalled at around $1 billion or so of revenue, growing just 6% last year (of course most of us wish we had this kind of problem). Certainly Netezza can attack Teradata’s installed base, but enterprise buyers are notoriously conservative, and will have to be dragged kicking and screaming to shift platforms once operational. So this to me suggests that there is a ceiling to the appliance market. If true, this means that you cannot just draw an extrapolation of Netezza’s current superb revenue growth. I have not seen this written about elsewhere, so perhaps it is just a figment of my imagination, and Netezza will prove me wrong. However you can look to Teradata to see that even it has entirely failed to enter certain industires, typically business to business industries where data is complex rather than high in volume. Fo example there is scarely a Teradata installation in the oil industry, which fits this category of complex but mostly low volume data (except for certain upstream data).

So, bearing this in mind, what would be a valuation? Well, solid companies like Datamirror are changing hands for 3x revenue or so, though these are companies with merely steady growth rather than the turbo-charged growth demonstrated by Netezza. So suppose we skip the pesky profitability question, accept this is a premium company and went for five times revenues? This would lead to a valuation of $400M on trailing revenues, maybe $500M on this year’s likely revenues. Yet the offer price of the shares implies a market cap of $621M, virtually eight time trailing revenues, and six times likely forward revenues.

This is scarcely a bargain then, though it is a multiple that will bring joy to the faces of other BI vendors, assuming that the IPO goes well. Of course such things are generally carefully judged, and no doubt the silver tongued investment bankers have gauged that they can sell shares at this price. However for me there seems a nagging doubt, based mainly on what I perceive to be this (in my view) effective cap on the market size that appliances can tackle, and to a lesser extent that lack of proven ability to generate profits. The markets will decide.

The performance of Netezza shares will be a very interesting indicator of the capital market’s view on BI vendors, and will show whether enterprise technology is coming in from the cold winter that started in 2001. Anyway, many congratulations to Netezza, who have succeeded in carving out a real success story in the furrow that for so long was owned by Teradata.

Postscript. On the first day of trading, no one seems troubled about any long term concerns.

Teradata steps into the light

January 8, 2007

In a logical move that I would say was overdue, Teradata finally became its own boss. It has long been nestling under the wing of NCR, but there was little obvious synergy between ATM machines and data warehouse database software, and so it seems to me eminently sensible for Teradata to stand on its own two feet. Running two quite different businesses with the same company is always a problem, as different business models lead to natural tensions as the company tries to accommodate different needs within the same corporate structure.

Teradata accounts for about USD 1.5 billion of revenue, around one third of NCR. The challenge for Teradata is growth. It has succeeded when others failed in the specialist database market, dominating the high end data warehouse market despite competing with Oracle, IBM and (to a lesser extent) Microsoft. Yet revenues have been pretty flat in the last couple of years, and there is new competition in the form of start-up Netezza, which although tiny compared to Teradata is nonetheless making steady inroads, and causing pricing pressure. Teradata has generally loyal customers though notoriously opaque pricing, which has enabled it to achieve good margins (especially on support), though its finances were never entirely clear as they were wrapped up with NCR. Splitting the company out will allow the market to value Teradata on its own merits.

Marketing blues

September 28, 2006

My prize for the most creative marketing jargon of the week goes to IBM, who announced that they now consider their offerings to be a “third generation” of business intelligence.  Come again?  In this view of the world, first generation BI was mainframe batch reporting, while the second generation was data warehousing and associated BI tools like Cognos, Business Objects etc.  So, as you wait with bated breath for the other shoe to drop, what is the “new generation”?  Well, it would seem that this should include three things:

(a) pre-packaged applications

(b) focus on the access and delivery of business information to end users, and support both information providers and information consumers

(c) support access to all sorts of information, not just that in a data warehouse.

Well (a) this is certainly a handy definition, since IBM just happens to provide a series of pre-built data models (e.g. their banking data model) and so (surprise) would satisfy the first of these criteria.  It is in fact by no means clear how useful such packages are outside of a few specific sectors that lend themselves to standardisation.  Once you take a pre-existing data model and modify it even a little (as you will need to) then you immediately create a major issue for how you support the next vendor upgrade.  This indeed is a major challenge that customers of the IBM banking model face.  Nothing in this paper talks about any new way of delivering these models e.g. any new semantic integration and versioning capability.

Criteria (b) is essentially meaningless since any self respecting BI tool could reasonably claim to focus on information consumers.  After all, the “universe” of Business Objects was a great example of putting user-defined terminology in front of the customer rather than just presenting tables and columns.  Almost any existing data warehouse with a decent reporting tool could claim to satisfy this criteria.

On (c) there is perhaps a kernel of relevance here, since there is no denying that some information needs are not always kept in a typical data warehouse e.g. unstructured data.  Yet IBM itself does not appear to have any new technology here, but merely is claiming that DB2 Data Joiner allows links to non-DB2 sources. All well and good, but this is not new. They haven’t even done something like OEM an unstructured query product like Autonomy, which would make sense.

Indeed all that this “3rd generation” appears to be is a flashy marketing label for IBM’s catalog of existing BI-related products.  They have Visual Warehouse, which is a glorified data dictionary (now rather oddly split into two separate physical stores) and scheduling tool, just as they always have.  They talk about ETI Extract as an ETL tool partner, which is rather odd given their acquisition of Ascential, which was after all one of the two pre-eminent ETL tools, and given ETI’s near-disappearance in the market over recent years.  They have DB2, which is a good database with support for datatypes other than numbers (just like other databases).  They also have some other assorted tools like Vality for data quality.

All well and good, but this is no more and no less than they had before. Moreover it could well be argued that this list of tools actually misses several important points that could be regarded as important from a “next generation” data warehouse architecture.  The paper is oddly silent on the connection between this and master data management, which is peculiar given IBM’s buying spree in this area and its direct relevance to data warehousing and data quality.  There is nothing about time-variance capabilities and versioning, which are increasingly important.  What about the ability to handle a federation of data warehouses and synchronise these?  What about truly business model-based data warehouse generation and maintenance?  How about the ability to be embedded into transactional systems via SOA?  What about “self discovery” data quality capabilities, which are starting to appear in some start ups.

Indeed IBM’s marketing group would do well to examine Bill Inmon’s DW 2.0 material, which while not perfect at least has a decent go at setting out some of the capabilities which one might expect from a next generation business intelligence system.

There is no denying that IBM has a lot of technology related to business intelligence and data warehousing (indeed, its buying spree has meant that it has a very broad range indeed).  Yet there is not a single thing in this whitepaper that constitutes a true step forward in technology or design.  It is simply a self-serving definition of a “3rd generation” that has nothing to do with limitations in current technology or new features that might actually be useful.  Instead it just sets out a definition which conveniently fits the menagerie of tools that IBM has developed and acquired in this area. To put together a whitepaper that articulates how a series of acquired technologies fits together is valid, and in truth this is what this paper is.  To claim that it represents some sort of generational breakthrough in an industry is just hubris, and destroys credibility in the eyes of any objective observer.  This is by no means unique in the software industry, but is precisely why software marketing has a bad name amongst customers, who are constantly promised the moon but delivered something a lot more down to earth.

I suppose when presented with the choice of developing new capabilities and product features that people might find useful, or just relabelling what you have lying around already as “next generation”, the latter is a great deal easier.  It is not, however, of any use to anyone outside a software sales and marketing team.

 

 

Open source appliances

August 1, 2006

The early success of Netezza has not only prompted other “me too” database appliance players like DataAllegro but also now an open source variant.  Greenplum has combined its open source database with Sun hardware to come up with an appliance of its own.  This is an interesting move that makes sense for Greenplum, since Sun obviously has a far larger sales channel than itself and so is a potentially powerful partner.  The article in Information Week is rather too breathless in its description of a DBMS + some hardware as an “instant data warehouse” however.  This is nonsense, and is no more so than a copy of SQL Server and a PC is “an instant data warehouse” (and at least SQL Server has some functionality directly useful to data warehousing in it). If all you had to do was supply a DMBS to have a data warehouse then there would be a lot of unemployed systems integrators and consultants.

However, leaving aside the inability of the journalist to see beyond the words of the company press release, this is an astute move by Greenplum, which can use Sun’s credibility to make it reassure nervous enterprise customers who may otherwise be twitchy about entrusting their data warehouse data to an open source platform.  However any prospect should be careful to check the performance characteristics of their application to see how well the Bizgres DBMS stacks up, since just throwing hardware at a problem can be an expensive thing to do.  But from a customer perspective it is good to have another choice to compare with Teradata and Netezza if you have an ultra-high transaction volume problem.

 

So many databases…

July 24, 2006

In his ZD Net blog discussing systems management David Berlind raises a useful point: is it possible for a company to truly standardise on a DBMS?  While every CIO would like to be able to say “We only have xxxx” (substitute Oracle/DB2/SQL Server etc as desired) in reality it is virtually impossible to standardise on a single DBMS.  For a start, the sheer weight of existing systems mean that conversion of a system using one DBMS to another is a far from routine activity.  The DBMS systems might all support one of the SQL standard variants, but we all know that in reality much of the code to access a DBMS uses the various proprietary extensions of the database.  Even if you could somehow wave a magic wand and achieve a single DBMS nirvana for your in-house systems, many packages only support particular databases and so you may be forced into a seperate DBMS by a package vendor – few CIO departments have the clout to prevent the business choosing a package that is non-standard if the business wants it enough.  Moreover any attempt at standardisation will be overtaken by acquisitions made by the company, since you certainly can’t assume that every company that you acquire has a single DBMS or that, even if they did, their choice is the same as yours.

The difficulty in actually porting an application from one DBMS to another actually gives a powerful lock-in to DBMS vendors.  If you doubt this, check out the margins of Oracle, Microsoft and IBM’s software business (if you can find it) – net profit margins of 30% or more, which is what Oracle and Microsoft have, are achieved by very few companies in the world.  The trouble companies have is that, while they may be unhappy with a DBMS vendor, they can huff and they can puff at contract negotiation time, but in reality there is very little they can do: the DBMS vendor knows that the cost of migrating off of their platform would dwarf any savings in license costs that could be obtained.

In reality it is the same with applications and the data within them.  Migrating from a major application package is a massive task, so in reality you will have to live to learn with a hetereogeneous world of databases, applications and data, like it or not.  Yet companies spend huge amounts of money  trying to “standardise”, which is essentially only a little more attainable as a goal than the holy grail.  Better to try and manage within the reality and concentrate ion strategies that will allow you to make the most of that situation.  Certainly, you don’t want to allow unnecessary diversity, but you need to carefully look at the true cost/benefit case of trying to eliminate it.

Size still isn’t everything

June 7, 2006

Madan Sheina, who is one of the smarter analysts out there, has written an excellent piece in Computer Business Review on an old hobby horse of mine: data warehouses that are unnecessarily large. I won’t rehash the arguments that are made in the article here (in which Madan is kind enough to quote me) as you can read it for yourself but you can be sure that bigger is not necessarily better when it comes to making sense of your business peformance: indeed the opposite is usually true.

Giant data warehouses certainly benefit storage vendors, hardware vendors, consultants who build and tune them and DBAs, who love to discuss their largest database as if is was a proxy for their, er, masculinity (apologies to those female DBAs out there, but you know what I mean; it is good for your resume to have worked on very large databases). The trouble is that high volumes of data make it harder to quickly analyse data in a meaninfgul way, and in most cases this sort of data warehouse elephantitis can be avoided by careful consideration of the use cases,probably saving a lot of money to boot. Of course that would involve IT people actually talking to he business users, I won’t be holding my breath for this more thoughtful approach to take off as a trend. Well done Madan for another thoughtful article.

Size isn’t everything

November 15, 2005

An October 2005 survey by IT Toolbox shows that, even amongst large companies, the size of the corporate data warehouse is, er, not that big. Out of 156 responses (40% US), only 12% had enterprise data warehouses larger than 4TB, while 18% had ones between this and 1 TB, while the rest had data warehouses less than 1TB. Indeed 25% had data warehouses less than half a terabyte. Admittedly only 20% of customer had just one data warehouse, with 26% having over five warehouses, but these figures may seem odd when you hear about gigantic data warehouses in the trade press. Winter Group publish a carefully checked list of the 10 largest data warehouses in the world, and their 2005 survey shows the winner at Yahoo weighing in at 100TB. The tenth largest, however (at Nielsen), is 17TB, which shows that such mammoths are still a rarity.

Why are IT folks obsessed about this? I can recall speaking at a data warehouse conference a few years ago and speaker after speaker eagerly quoted the size of his data warehouse as some sort of badge of courage: “Well, you should see how big mine is…”. Of course companies that sell hardware and disk storage love such things, but why is there such a big discrepancy between the behemoths in the Winter Group survey and the less than a terabyte brigade? The answer is quite simple: business to business companies don’t have large transaction volumes. If you are a large retailer or a high-street bank, then you may have thousands of branches, each one contributing thousands of individual transactions a day. These add up, and constitute the vast majority of the volume in a data warehouse (perhaps 99% of the volume). The rest of the data is the pesky master data (or reference data, or dimension data – choose your jargon) such as “product”, “customer”, “location”, “brand”, “person”, “time” etc that provides the context of these business transactions. You may have millions of transactions a day as a retailer, but how many different products do you stock? 80,000 for a convenience store chain? 300,000 for a department store? Certainly not tens of millions. Similarly McDonalds has 27,000 retail outlets, not millions. The same for organizational units, employees etc. One exception that can be very large is “customer” but again this is true only for business to consumer enterprises e.g. retailers or Telcos. Companies like Unilever are very large indeed, but primarily sell to other businesses, so the number of direct customers they deal with is measured in the many thousands, but not millions.

So B2B enterprises usually have quite small data warehouses in volume, even though they may have extremely sophisticated and complex master data e.g. elaborate customer segmentation or product or asset classification. One way to measure such complexity is by adding up the types of business entity in the data model e.g. each level of a product hierarchy might count as one “class of business entity” (CBE), “customer” as another. Some very large data warehouses in volume terms often have very simple business models to support, perhaps with 50 CBEs. On the other hand a marketing system for a company like BP may have 400 or more CBEs. This dimension of complexity is actually just as important as raw transaction size when looking at likely data warehouse performance. A data warehouse with 1TB of data but 50 CBEs may be a lot less demanding than one with 200GB of data but 350 CBEs (just think of all those database joins). Oddly, this complexity measure never seems to feature in league tables of data warehouse size, perhaps because it doesn’t sell much disk storage. I feel a new league table coming on. Anyone out there got a model with more than 500 CBEs?