Andy on Enterprise Software

The Brits are coming

January 11, 2008

Not the Oscars this time, but a data warehouse appliance. Teradata carved out a successful high end niche in database and hardware technology specifically aimed at analytic rather than transactional processing, succeeding where previous attempts (e.g. Red Brick, Britton Lee) had faltered. However it is the rapid rise of Netezza that caused a flurry of look-alike appliance vendors to sprout up in the last couple of years such as DatAllegro, Datupia, ParAccel etc. I believe that it will be much easier to convince conservative buyers about appliances if they do not come with proprietary hardware, and indeed this is the approach taken by Datupia. However the software-only appliance route was taken a couple of years earlier by Kognitio (a re-brand of Whitecross). Kognito initially had a proprietary hardware link and had built up some impressive references in the UK such as BT (who have serious data volumes) but had not succeeded as broadly commercially as they might have done; in my view they were held back by the proprietary hardware issue (especially in a conservative UK market). This has been addressed, and a major re-engineering exercise has now allowed their WX2 V6 product to run on commodity X86 hardware such as data blades.

WX2 uses scanning technology, no indexes, and is an RDBMS using hardware parallelism and smart use of memory in preference to disk access where possible to achieve its performance. The product reads in data from a flat file, loads it quickly (1 terabyte an hour) and can then achieve extremely fast read performance. In one test 23 billion rows were read in two seconds. This approach differs from column-oriented databases (e.g. Sybase, ParAccel) whose design can also achieve high performance for certain analytic queries but are inherently less flexible. A typical Kognitio implementation may involve 80 servers in groups of four. Resilience is obviously a key issue for such large data volumes, and the company claims that if you pull a server out of the rack and so artificially crash the system, it is able to restart in a just a few minutes.

The technology does not compete with data quality tools, as it assumes that pre-validation of data has been completed prior to loading. It could be characterised in philosophy as ELT (rather than ETL) since with such fast performance at its disposal it may be more efficient to carry out transformations within the database engine than pre-processing prior to loading. An ODBC interface allows the loaded data to be queries by any normal reporting tool. Against conventional databases such as Oracle, appliances can show dramatic results. In one recent proof concept on a half terabyte sample database, some queries were demonstrated to be 40 times faster than the existing warehouse.

Kognitio already has nearly half its customers on its software as a service model, which I wrote about previously. The more traditional licences result in orders typically in the GBP 300k – 1.2M range. The company has added more solid customer references such as Marks and Spencer and Scottish Power (it has a few dozen customers now), and has grown to 78 employees and around GBP 8 million in revenue, having been profitable for three years. This solid commercial performance has now given it the base to branch out into the massive US market, and it is about to open a head office in Chicago with sales offices in Boston and San Francisco.

Kognitio has the advantage of non-proprietary hardware ties (unlike Netezza) and a solid and lengthy track record of successful reference customers (unlike more recent appliance start-ups), which should be a potent combination if it can crack sales and marketing to the US market.

Appliances on demand

November 27, 2007

It is interesting to see Kognitio launching a data warehouse on demand service. Traditionally data warehouses are built in-house, partly because they are mostly “built” rather than bought even today, and partly because of the data-intensive nature of them, by definition involving links to multiple in-house systems. However there is no real reason why the service cannot be provided remotely. In my days at Shell my team used to provide a similar internal service to small business units who did not want to build up in-house capability. We implemented a warehouse, built the interfaces and then managed the operational service. Kognitio is well placed to provide such a service because they have good data migration experience, and they conveniently have a powerful warehouse appliance, which is much more mature than many others, even if it has been, until recently, not very successfully marketed. Hence this seems an astute move to me.

I would not expect this to be the last such offering. Given some clear advantages that software as a service brings to customers (less installed software footprint, typically a smoother pricing model) it will be interesting to see whether these advantages outweigh the fear in customer minds about allowing their key data outside the firewall.

Self Raising Appliances

November 21, 2007

Dataupia has an odd name (presumably hinting at data utopia) but a very interesting idea. The technology was neatly summarised by Phil Howard so I won’t repeat the details. The key is that it promises something that sounds too good to be true: an appliance that runs on existing databases (Oracle, SQL Server etc) essentially removing the execution of queries and data management form the DBMS, and running queries on a massively parallel processing architecture using commodity hardware. Coming from one of the founders of Netezza, it has inherent credibility, and I am looking forward to hearing some production customer case studies to validate whether it is really as good as it claims. If it does something close to what it claims to do then it could have a great market, since it removes the key barrier that limits the market of data warehouse appliances like Netezza (and indeed Teradata, the uber “appliance”), which is the proprietary nature of their software. This makes buyers nervous and at the very least means a significant conversion effort for an existing application. But if you can really just plug in the Dataupia appliance without modifying any SQL, and just watch the queries run faster, then it will appeal to a whole range of creaking data warehouse applications that Netezza et al have yet to convince. Given that most data warehouses are smaller than you might think, there is a large market out there Dataupia can address which will never be appropriate for Netezza and the like. It also has partner potential due to its non-invasive nature e.g. Kalido and Dataupia already have a relationship, and there are already early OEM deals on show.

The venture world obviously buys the story, as in a fund-raising environment where enterprise technology companies are as out of fashion as corduroy, Dataupia has secured a USD 16 million B round. This is no mean achievement in itself these days. To me this is definitely a company to watch.

The surprisingly fertile world of database innovation

July 24, 2007

I came across a thought-provoking article, an interview with Michael Stonebraker. As the inventor of Ingres this is someone who knows a thing or two about databases, and I thought that some interesting points were raised. He essentially argues that advances in hardware have meant that specialist databases can out-perform the traditional ones in a series of particular situations, and that these situations are in themselves substantial markets that start-up database companies could attack. He singles out text, where relational databases have never prospered, fast streaming data feeds of the type seen on Wall Street, data warehouses and specialist OLTP. With Streambase he clearly has some first-hand experience of streaming data, and OLTP is what he is working on right now.

I must admit that with my background in enterprise architecture at Shell I underestimated how much of a market there has been for specialist databases, assuming that the innate conservatism of corporate buyers would make it very hard for specialsit database vendors. Initially I was proved right, with attempts like Red Brick flickering but quickly becoming subsumed, while object databases were clearly not going to take off. With such false starts it was easy to extrapolate and assume that the relational vendors would simply win out and leave no room for innovation. However to take the area of data warehousing, this has clearly not been the case. Teradata blazed the trail of a proprietary database superior in data warehouse performance to Oracle etc, and now Netezza and a host of smaller start-ups are themselves snapping at Teradata’s heels. The in-memory crowd are also doing well, with for example Qliktech now being the fastest growing BI vendors by a long way, thanks to its in-memory database approach. Certainly Stonebraker is right about text – companies like Fast and their competitors would not dream of using relational databases to build their text search applications, an area where Oracle et al never really got it right at all.

Overall there seems to be a surprising amount of innovation in what at first glance looks like an area which is essentially mature, dominated by three big vendors: Oracle, IBM, Microsoft. Teradata has shown that you can build a billion dollar revenue company in the teeth of such entrenched competition, and the recent developments mentioned above suggest that this area is far from being done and dusted from an innovation viewpoint.

Informatica looks perky

July 23, 2007

Informatica announced an excellent set of quarterly results, demonstrating continuing rude health. Revenue of $94M was a spanking 17% up on the same time last year. License revenue was up 15% at $41M, so the improvement was more than just good services revenue. Eight deals over $1 million compared to nine last time, but deals over $300k were massively up with 35 compared to just 9 a year ago. There was also a major OEM deal, with SAP now going to OEM Informatica, a rare exception to their usual not invented here attitude. This is a good move for both parties.

The results were broad-based, with Informatica’s international operations doing particularly well. These results are a sign of continuing broad based good conditions n the broader BI market. When ETL prospers, data warehouses and BI tools are not far behind.

Netezza heads to market

July 20, 2007

The forthcoming Netezza IPO will be closely watched by those interested in the health of the technology space, and the business intelligence market in particular. Netezza has been a great success story in the data warehouse market. From being founded in 2000 its revenues have risen dramatically. Its fiscal year ends in January. Revenues have climbed from $13M in 2004 to around $30M in 2005 to £54M in 2006, to $79.6M in fiscal year ending January 2007. Its revenues in the quarter ending April 2007 were $25M. Hardly any BI vendors can claim this kind of growth rate (other than Qliktech), especially at this scale. Its customer base is nicely spread amongst industries and is not restricted to the obvious retail, telco and retail banking. So, is this the next great software (actually partly hardware in this case) success story?

Before you get too excited, there are some things to ponder. Note that in 2006 Netezza lost $8M despite that steepling revenue rise. In the latest quarter it still lost $1.6M. This is interesting, since conventional wisdom has it that you can only IPO these days with a few quarters of solid profits, yet Netezza has yet to make a dime. Certainly, it would be fair to assume that if it can keep growing at this rate, profit will surely come (at least its losses are shrinking) but the past has showed that profits can be elusive in fast growing software companies. Also, the data warehouse market is certainly healthy, advancing at 9% or so according to IDC projections, but this is well below Netezza’s growth rate. More particularly, Netezza only attacks one slice of the data warehouse market, the high data volume one. If you have a small data warehouse then you don’t need Netezza, so only certain industries will really be happy hunting grounds for appliances like Netezza. This can be seen in the Teradata story, which is Netezza’s true competitor. Teradata has stalled at around $1 billion or so of revenue, growing just 6% last year (of course most of us wish we had this kind of problem). Certainly Netezza can attack Teradata’s installed base, but enterprise buyers are notoriously conservative, and will have to be dragged kicking and screaming to shift platforms once operational. So this to me suggests that there is a ceiling to the appliance market. If true, this means that you cannot just draw an extrapolation of Netezza’s current superb revenue growth. I have not seen this written about elsewhere, so perhaps it is just a figment of my imagination, and Netezza will prove me wrong. However you can look to Teradata to see that even it has entirely failed to enter certain industires, typically business to business industries where data is complex rather than high in volume. Fo example there is scarely a Teradata installation in the oil industry, which fits this category of complex but mostly low volume data (except for certain upstream data).

So, bearing this in mind, what would be a valuation? Well, solid companies like Datamirror are changing hands for 3x revenue or so, though these are companies with merely steady growth rather than the turbo-charged growth demonstrated by Netezza. So suppose we skip the pesky profitability question, accept this is a premium company and went for five times revenues? This would lead to a valuation of $400M on trailing revenues, maybe $500M on this year’s likely revenues. Yet the offer price of the shares implies a market cap of $621M, virtually eight time trailing revenues, and six times likely forward revenues.

This is scarcely a bargain then, though it is a multiple that will bring joy to the faces of other BI vendors, assuming that the IPO goes well. Of course such things are generally carefully judged, and no doubt the silver tongued investment bankers have gauged that they can sell shares at this price. However for me there seems a nagging doubt, based mainly on what I perceive to be this (in my view) effective cap on the market size that appliances can tackle, and to a lesser extent that lack of proven ability to generate profits. The markets will decide.

The performance of Netezza shares will be a very interesting indicator of the capital market’s view on BI vendors, and will show whether enterprise technology is coming in from the cold winter that started in 2001. Anyway, many congratulations to Netezza, who have succeeded in carving out a real success story in the furrow that for so long was owned by Teradata.

Postscript. On the first day of trading, no one seems troubled about any long term concerns.

Gazing Behind the Data Mirror

July 18, 2007

I have been digging a little deeper into the Data Mirror purchase by IBM that I wrote about yesterday.

It’s a good deal for IBM, and not only because the price was quite fair. With its Ascential acquisition IBM positioned itself directly against Informatica, yet Ascential’s technology did not have the serious real-time replication that is important for the future of ETL, and this is what Data Mirror does have. DataMirror gives IBM a working product with heterogeneous data source support in real time, giving IBM an important piece in the puzzle to achieve their vision for real-time operational BI and event-awareness.

A bigger question is whether IBM fully understands what it has bought and whether it will properly exploit it. Data Mirror’s strengths were modest pricing, low-impact installation, neutrality of sources it supports and performance doing this (via its log-scraping abilities and speed of applying changes). IBM must keep their eye on the development ball to ensure these aspects of the DataMirror technology are continued if it is to really exploit its purchase. For example, on the last point, the partnerships DataMirror has with Teradata and Netezza and Oracle should be continued, despite the obviously temptation to snub rivals Oracle and Teradata.

Any acquisition creates uncertainty amongst staff, and IBM needs to move beyond motherhood reassurance to show staff that it understands the DataMirror technology and business and wants to see it thrive and grow. It needs to explain how the DataMirror technology fits within a broader vision for real-time integration in combination with traditional batch oriented ETL, business intelligence and enterprise service bus (not just MQSeries) integration or else the critical technical and visionary people will dust off their resumes and start looking elsewhere.

I gather that IBM has already announced an internal town hall meeting next week, at which it needs to convince key technical staff that they have a bright future within the IBM family. I also hear that no hiring freeze has been imposed, which implies they are making the decision of growing the business, which should reassure people. IBM is an experienced company which will recognise that the true IP of a company is not in libraries of code but in the heads of a limited number of individuals, and no doubt will recognise the need to retain and motivate critical staff. It used to be poor at this (think about the brilliant technology it acquired when it bought Metaphor many years ago, but bungled the follow-up) but has got smarter in recent years e.g. I hear from DWL people that they have been treated well.

Hopefully IBM’s more recent and happier acquisition experiences will be the case here.

Mirror, mirror on the wall, who is most blue of them all?

July 17, 2007

on Monday IBM announced it would buy DataMirror, a Canadian software company. Data Mirror made its living by selling software that detects change in data sources and then managing replication. It differed from other ETL technology in being designed from the ground up to work in real-time rather than batch, which made it well suited to some customer situations, and the software was modestly priced. The technology was also used by some customers for backup and business continuity reasons. It had a large customer base (well over 2,000).

For IBM the acquisition adds some solid technology to its data warehouse offering and its “on demand” strategy, in this case replacing Powerpoint promises with something that actually works. Datamirror was publicly traded on the Toronto stock exchange. It did $46.5 million in revenue last year and was hoping for $55 million in fiscal year 2008, so this was a company that was delivering solid though unspectacular growth, though its share price had doubled in the last twelve months. IBM’s price of $162 million is over three times trailing revenues and so is a healthy valuation for the company, and a small premium to its stock market valuation of last week.

If all you have is a hammer…

June 29, 2007

Claudia Imhoff raises an important issue in her blog regarding the cleansing of data. When dealing with a data warehouse it is important for data to be validated before being loaded into the warehouse in order to remove any data quality problems (of course, ideally you would have a process to go back and fix the problems at source also). However, as she points out, in some cases e.g. for audit purposes, it is actually important to know what the original data actually was, not just a cleansed version. This issue gets at the heart of a vital issue surrounding master data, and neatly illustrates the difference between a master data repository and a data warehouse.

In MDM it is accepted (at least by those who have experience of real MDM projects) that master data will go through different versions before producing a “golden copy”, which would be suitable for putting into a data warehouse. A new marketing product hierarchy may have to go through several drafts and levels of sign-off before a new version is authorised and published, and the same is true of things like drafts of budget plans, which go through various iterations before a final version is agreed. This is quite apart from actual errors in data, which are all too common in operational systems. An MDM application should be able to mange the workflow of such processes, and have a repository that is capable of going back in time and tracking the various versions, not just the finished golden copy. A good MDM repository should allow you to track back through master data as it is “improved” over time, not just look at the golden copy. The golden copy only should be exported to the data warehouse, where data integrity is vital.

People working on data warehouse projects may not be aware of such compliance issues, as they usually care only about the finished state warehouse data. MDM projects should always be considering this issue, and your technology selection should reflect the need for your MDM technology to track versions of master data over time.

I see a tall dark stranger in your future….

June 17, 2007

There is an interesting article in CIO Insight by Peter Fade, a professor of marketing at the top-rated Wharton Business School. in this he discusses the limitations of data mining, and it is an article that anyone contemplating investing in this technology should read carefully. I set up a small data mining practice when I was running a consulting division at Shell, and found it a thankless job. Although I had an articulate and smart data mining expert and we invested in what at the time was a high quality data mining tool, we found time and again that it was very hard to find real-world problems where the benefits of data mining could be shown. Either the data was such a mess that little sense could be made of it, or the insights shown by the data mining technology were, as Homer Simpson might say, of the “well, duh” variety.

Professor Faber argues that in most cases the best you can hope for is to develop simple probabilistic models of aggregate behaviour, and you simply cannot get down to the level of predicting individual behaviour using the level of data that we typically have, however alluring the sales demonstrations may be. Moreover, such models can mostly be built in Excel and don’t need large investments in sophisticated data mining tools.

While I am sure there are some very real examples where data mining can work well e.g. why some groups of people are better credit risks than others, the main point he makes is that the vision of 1-1 marketing via a data mining tool is a fantasy, and that the tools have been seriously oversold. Well, that is something that we in the software industry really do understand. We all want technology to provide magical insights into a messy and complex world that is hard to predict. Unfortunately the technology at present is generally as useful as a crystal ball when it comes to predicting individual behaviour. Yet there is still that urge to go into the tent and peer into the mists of the crystal ball in search of patterns.