Andy on Enterprise Software

A new twist to appliances

May 18, 2007

I wondered what Foster Hinshaw would get up to after he left Netezza, and now we know. He has set up the rather awkwardly named Dataupia, a data warehouse appliance with a difference. It is an important difference, as his appliance runs on Oracle rather than on a proprietary database like Netezza. It will also run on DB2 or SQL Server, for that matter. You just plug in MPP capable hardware to take advantage of the appliance. This is important, since having a proprietary database brings with it not only a certain amount of cost and new skills required, but also makes conservative corporate buyers nervous. If you are a Telco with really vast amounts of transaction data then this trade off may be worthwhile, as indeed can be seen in Netezza’s considerable success, but if you could get much of the benefit (and this is unclear since at this stage there are no comparative performance figures) while still running on your existing mainstream database, this would sooth the nerves of corporate CIO types who might otherwise try and block the introduction of a new database. Just as importantly, it allows existing data warehouse applications to be able to claim appliance like performance boosts. While the vast bulk of data warehouses today are custom built, this ought to be of interest to true data warehouse applications such as Kalido, which could presumably easily run on top of Dataupia’s appliance.

I think this is a very interesting development, assuming that the new product delivers on its promise. The market for an appliance capable of running on a mainstream database platform ought to be much broader than the set of applications that currently addressed by hardware appliances (or even software-based ones with their own database like Kognitio).

EII – dead and now buried

April 27, 2007

The most widely publicised piece that I wrote was “EII Dead on Arrival” back in July 2004. Metamatrix was the company that launched the term on the back of heavy funding from top end VCs, and I wrote previously about what seemed to me to be its almost inevitable struggles. There was some controversy over my article, which differed from the usual breathless press coverage which was associated with EII at the time (our industry does love a new trend and acronym, whatever the reality may be). I could never see how it could work outside a very limited set of reporting needs. Well, as they say on Red Dwarf: “smug mode”.

Gravity finally caught up with marketing hype this week, and Metamatrix will be bought by Red Hat and made into open source. It would have been interesting to know what the purchase price was, but Red Hat were keeping quiet about that. It is a fair bet that it was not a large sum of money. Kleiner Perkins won’t be chalking this up as one of their smarter bets.

Philosophy and data warehouses

March 23, 2007

Database expert Colin White wrote a prvocative article the other day:

http://www.b-eye-network.com/blogs/business_integration/archives/2007/03/what_has_busine.php

in which he ponders whether a data warehouse is really needed for business intelligence. This is an interesting question; after all, why did we end up with data warehouses in the first place rather than just querying the data at source? (which is surely a simpler idea). There seem to me to be a few reasons:

(a) Technical performance. Early relational databases did not like dealing with mixed workloads of transaction update and queries, as locking strategies caused performance headaches.

(b) Storage issues. Data warehouses typically need several years of data, whereas transaction systems do not, so archiving transaction after a few months has a performance benefit and may allow use of cheaper storage media.

(c) Inconsistent master data between transaction systems (who owns “product” and “customer” and “asset” and the like) means that it is semantically difficult to query systems across departments or subsidiaries. Pulling the data together into a data warehouse and somehow mashing it into a consistent structure fixes this.

(d) You may want to store certain BI related data e.g. aggregates or “what-if” information that is useful purely for analysis and is not relevant to transaction systems. A data warehouse may be a good place to do this.

(e) People have trouble finding existing pre-built reports, so having a single place where these live makes re-use easier.

(f) Data quality problems in operational systems mean that a data warehouse is needed to allow data cleansing before analysis.

I think that you can make a case that technology is making some strides to address certain of these areas. In the case of (e), the application of Google and similar search mechanisms (e.g. FAST) to the world of BI may reduce or eliminate problem (e) altogether. Databases have become a lot more tolerant of mixed workloads, addressing problem (a), and storage gets cheaper, attacking problem (b). It doesn’t seem to me that you necessarily have to store what-if type data in a data warehouse, so maybe (d) can be tackled in other ways. Even problem (f), while a long way from being fixed, at least has some potential now that some data quality tools are allowing SOA-style embedding within operational systems, thus holding out the possibility of fixing many data quality issues at source.

If we then take all the master data out of the data warehouse and put in into a master data repository would this not also fix (c)? Well, it might, but regrettably this discipline is still in its infancy, and it seems to me that plucking data out of transaction systems into specific hubs like a “customer hub” or a “product hub” may not be improvoing the situation at all, as indeed Coln acknowledges.

Where I differ from Colin is on his view that a series of data marts combined with a master data store may be the answer. Since data marts are subject specific by definition, they may address a certain subset of needs very well, but cannot address enterprise-wide analysis. This type of analysis can only be done by something with access to potentially all the data in an enterprise, and be capable of resolving master data issues across the source systems. Here a data warehouse in conjunction iwth a master data store makes more sense to me than a series of marts plus a master data store – why perpetuate the issues? I have no problem if the data marts are dependent i.e. generated from the warehouse e.g. for convenience/performance. But as soon as they are maintained outside a controlled environment you come back to problem (c) again.

Sadly, though some of the recent technical improvements point the way to the solution of problems (a) through (f), the reality on the ground is a long way off allowing this. For example the data quality tools could be embedded via SOA into operatonal systems and linked up to a master data store to fix master data qualit issues, but how many companies have done this at all, let alone across more than a pilot system? Master data repositories are typically still stuck in a “hub mentality” that means they are at best, as Colin puts it, “papering over the cracks of master data problems”. Moreover most data warehouses are still poorly designed to deal with historical data and cope with business change.

Hence I can’t see data warehouses going away any time soon. Still, it is a useful exercise to remind ourselves why we built them in the first place. Questioning the meaning of existence is called ontology, which ironically has now been adopted as a term by computer science to mean a data model that represents concepts within a domain and the relationship between those concepts. We seem to have come full circle, a suitable state for the end of the week.
Have a good weekend.

Just Singing The Blues

March 21, 2007

There was a curious piece of “analysis” that appeared a few days ago in response to IBM’s latest data warehouse announcments:

http://www.intelligententerprise.com/showArticle.jhtml;jsessionid=ZOETRZHF0SWBMQSNDLOSKH0CJUNN2JVN?articleID=198000675

In this gushing piece, Current Analysis analyst James Kobelius says: “IBM with these announcements becomes the premiere data warehousing appliance vendor, in terms of the range of targeted solutions they provide”. So, what were all these new innovative products that appeared?

Well, IBM renamed the hideous “Balanced Configuration Units” (you what now?) to “Balanced Warehouse”, a better name for sure. Also in the renaming frame was “Enterprise Class” being renamed to “E Class” (hope they didn’t spend too many dollars on that one). In fact the only supposedly “new” software at all that is apparent is the OmniFind Analytics Edition. The analysis credits this as a new piece of software, which will come as a surprise to many of us with memories longer than a mayfly e.g. the following announcement of Ominifind 8.3 is on the IBM website dated December 2005:

http://www-306.ibm.com/common/ssi/fcgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS205-342

In fact the whole release seems to be around repackaging and repricing, which is all well and good but hardly transports IBM to some new level it wasn’t at, say, a week ago.

Let’s not forget about “new services” such as “implementation of an IBM data warehouse” – well, that certainly was something that never crossed IBM’s Global Services mind before last week. Now, I’m not a betting man, but I would be prepared to wager a dollar that IBM have a contract with Current Analysis – any takers against?

The excellent blog “The Cranky PM” does a fine job of poking fun at the supposedly objective views of analyst firms that are actually taking thick wads of dollars from the vendors hat they are analysing e.g.

http://www.crankypm.com/analysts_whores/index.html

I wonder what she would make of this particular piece of insight?

Deja vu all over again

March 16, 2007

There is some good old fashioned common sense in an article by John Ladley in DM Review:

http://www.dmreview.com/article_sub.cfm?articleId=1078962

where he rightly points out that although most companies are now on their second or third attempt at data warehouses, they seem not to have learnt from the past and hence seem doomed to repeat prior mistakes. Certainly a common thread in my experience is the desire of IT departments to second guess what their customers need and end up making life unnecessarily hard for themselves. If you ask a customer how long he needs access to his detailed data he will say “forever” and if you ask how real time it needs to be of then of course he would love it to be instantaneous on a global basis. What is often not presented is the trade off: “well, you can have all the data kept forever, but the project costs will go up 30% and your reporting performance will be significantly worse than if you can make do with one year of detailed data and summaries prior to this”. In such a case the user might well change his view on how critical the “forever” requirement was.

This disconnect between corporate IT departments and the business continues to be a wide one. I recently did some work for a global company where a strategy session was to decide the IT architecture to support a major MDM initiative. None of the business people had even bothered to invite the internal IT department, such was the low regard in which it was held. Without good engagement between IT and business data warehouse projects will struggle, however up to date the technology used is.

Mr Ladley is also spot on regarding data quality – it is always much worse than people imagine. “Ah, but the source is our new SAP system so the data is clean” is the kind of naive comment that many of us will recognise. At one project at Shell a few years ago it was discovered that 80% of the pack/product combinations being sold in Lubricants were duplicated somewhere else in the Group. At least that could be partially excused by a decentralised organisation. Yet it also turned out that of a commercial customer database of 20,000 records, only 5,000 were truly unique, and this was in one operating company. Imagine the fun with deliveries and invoice payment that could ensue.

Certainly data warehouse projects these days have the advantage of more reliable technology and fuller function products than ten years ago, meaning less custom coding is required than used to be the case. However the basic project management lessons never change.

Appliances are proving popular

February 27, 2007

There is a useful overview of the growing appliance market in Computer Business Review:

http://www.cbronline.com/article_cbr.asp?guid=9104551D-56C1-4EE7-BDF9-BD219E8685BF

The appliance market is nothing if not growing, with no fewer than ten appliance vendors now identified by analyst Madan Sheina (who by the way, is one of the smarter analysts out there). Of course apart from Teradata many of these are small or very new. Teradata accounts for about USD 1 billion in total revenue (the accounts will become much clearer once they separate from NCR) though this includes services and support, not just licences. The next largest vendor is Netezza, who does not publish their revenue (though I would estimate over USD 50M). Kognitio used to be around USD 5M in revenue, though they seemed perky when I last spoke to them so may be a little bigger now. DataAllegro will certainly be smaller than Netezza, as will be the other new players. It is too early to say how well HP’s Neoview appliance will do, though clearly HP has plenty of brand and channel clout, especially now that it has acquired Knightsbridge.

Still, so many entrants to a market certainly tell you that plenty of people feel that money can be made. So far Teradata and Netezza have had the field pretty much to themselves, but the entrance of HP and the various newer vendors will create greater competition, which ultimately can only be of benefit to customers.

When is an appliance not an appliance?

February 19, 2007

What we call things is important. The recent rise of data warehouse “appliances”, pioneered by Netezza (and arguably Teradata before that) is an interesting case in point. For years the relational database vendors spent their energy in making sure that transaction systems ran quickly and reliably. Business intelligence applications were not a major focus, and this led to a number of approaches to dealing with very large data warehouse applications. Certain types of index scheme would work very well for read-only BI queries, for example, and Red Brick was an early example of a database optimised as such. Later Teradata did a superb job of carving out a high end niche by using parallel processing hardware and specialist database software to take advantage of this properly. They did such a good job that after a while Teradata almost became synonymous with large data warehouses, of the types typically encountered in retail banks, supermarket chains, telcos etc. Oracle and othe others made some half-hearted attempts to fight back with features like star joins, but by then it was too late: the specialist data warehouse device, in the form of Teradata, had become established. Of course such projects were still large and complex. Most data warehouse project costs are associated with people, not hardware or software, and this does not change whether you are using SQL Server or Teradata as your database.

However, marketing can at times (not often, but sometimes) be a clever and subtle thing. When Netezza brought out essentially a device like Teradata, but quicker and cheaper, the label “appliance” was used, and a very clever one it is. In normal English usage an appliance is something that we just plug in, like a toaster or a coffee maker. Without making any such overt claims, the “appliance” label has a comforting implication that your data warehouse project will have that toaster installation-like quality previously lacking with pesky traditional databases. Given that a DW appliance is just some clever hardware and an optimised database, your project issues are in fact identical to those of any other DW project. Analysis, user requirements, data quality, sourcing, design and reporting all have to be done, although the appliance may certainly be able to handle large volumes of data at a much better price point than a traditional hardware/database combination. Since the hardware and software on a project may typically account for less than 20% of the project costs, this is an undeniably useful thing, but hardly takes us into toaster territory.

Yet the label matters. In a rather breathless blog yesterday:

http://www.itbusinessedge.com/blogs/mia/index.php/2006/09/05/flaming-web-20/

Mike Stevens, who I don’t know personally but appears to have a background in PR rather than hands-on data warehouse project implementation, claims that appliances spell “trouble for traditional data warehouse vendors” since an appliance may cost just USD 150k whereas “conventional solutions cost millions”. He falls into the language trap of the appliance. Your data warehouse still has to to deal with all those people-intensive things (data sourcing reporting, testing) whether you use a conventional SQL database and a regular server, or a specialist DW appliance. The issues are all identical, except with an appliance you have some additional cost since less familiar skills will need to be brought to bear (there are more Oracle skills out there than Netezza ones). The savings on hardware by using an appliance may be very significant and comfortably justified on a large data warehouse, but such a project is not going to cost USD 150k and a quick plug in the wall socket.

If this kind of misconception is so easily repeated by journalists (or at least bloggers) then I wonder how widespread this view is amongst IT managers, and how much this has helped data warehouse “appliances” catch on? Would Netezza have done quite so well if they had been labelled something less reassuring, like a “data warehouse turbo toolkit”? It was said that HP was so bad at marketing it would, if it sold sushi, describe it as “cold dead fish”. The “appliance” vendors shows that smart marketing can still be done within hi-tech.

Impartial Advice?

January 17, 2007

HP continues with its plans for the business intelligence space with an announcement of in-house data warehouse technology:

http://www.computerworld.com:80/action/article.do?command=viewArticleBasic&articleId=9008218&intsrc=news_ts_head

with a new business unit. The offering with be based around HP’s attempt at a “data warehouse appliance”, called Neoview. This is a competitor to Teradata and Netezza, but at this stage it is hard to tell how functional this is, since it is unclear that there are any deployed customers other than HP itself.

The timing of this announcement is curious given HP’s acquisition of data warehouse consultancy Knightsbridge. Certainly data warehousing is a big market and Teradata is a tempting target – after all, most of the really big data warehouse deployments in retail, telco and retail banking use Teradata. There are lots and lots of juicy services to be provided in implementing an “appliance”, which in fact is no such thing. An appliance implies something that you just plug in, whereas data warehouse appliances are just a fast piece of hardware and a proprietary database, still requiring all the usual integration efforts, but with the added twist of non-standard database technology. Certainly plenty of business for consultants there.

However HP’s home-grown offering will not sit well with its newly acquired Knightsbridge consulting services, who made their reputation through a quite fiercely vendor-independent culture which always prided itself in choosing the best solution for the customer. People trust independent consultants to give them objective advice, since they are not (or at least they hope they are not) tied to particular vendor offerings. Presumably HP’s consultants will be pushing HP’s data warehouse solution in preference to alternatives, and so can hardly be trusted as impartial observers of the market. An analogy would be with IBM consultants, who while they may work with non-IBM software are clearly going to push IBM’s offerings given half a chance.

If you were a truly independent consultant how would you react to a brand new data warehouse appliance with a track record only of one deployment, and that in the vendor itself? Would you immediately be pushing that as your preferred solution, or would you be counseling caution, urging customers to wait and see how the new tool settles down in the market and how early customers get on with it? If you are a Knightsbridge consultant now working for HP, what would your advice be? Would it be any different to the advice you’d have offered in December 2006 before you became part of HP?

This kind of conflict of interest is what makes thing difficult for customers when choosing consultants. It is hard to find ones who are truly independent. Of course consultants always have their own agenda, but usually this is about maximising billable hours. If they are tied to a particular solution then that is fine if you are already committed to that solution, but you will need to look elsewhere for objective advice about it.

Teradata steps into the light

January 8, 2007

In a logical move that I would say was overdue, Teradata finally became its own boss. It has long been nestling under the wing of NCR, but there was little obvious synergy between ATM machines and data warehouse database software, and so it seems to me eminently sensible for Teradata to stand on its own two feet. Running two quite different businesses with the same company is always a problem, as different business models lead to natural tensions as the company tries to accommodate different needs within the same corporate structure.

Teradata accounts for about USD 1.5 billion of revenue, around one third of NCR. The challenge for Teradata is growth. It has succeeded when others failed in the specialist database market, dominating the high end data warehouse market despite competing with Oracle, IBM and (to a lesser extent) Microsoft. Yet revenues have been pretty flat in the last couple of years, and there is new competition in the form of start-up Netezza, which although tiny compared to Teradata is nonetheless making steady inroads, and causing pricing pressure. Teradata has generally loyal customers though notoriously opaque pricing, which has enabled it to achieve good margins (especially on support), though its finances were never entirely clear as they were wrapped up with NCR. Splitting the company out will allow the market to value Teradata on its own merits.

A long journey

January 5, 2007

An Accenture study:

http://www.informationweek.com/research/showArticle.jhtml?articleID=196800921

quantifies how much time middle managers in enterprises waste seeking out information, and comes out at two hours a day i.e. a quarter of an average working day. When they find it, half of the information turns out to be of no use. This sounds about right to me, and ilustrates just how far BI really has yet to go in being genuinely useful, and also shows just how bad the true state of information is in large companies.

The issue is not only that technologies are insufficiently intuitive. In my experience there are a number of factors that come into play:

- no culture of sharing information
- inconsistent data definitions
- poor data quality
- inability to locate appropriate data sources
- insufficient understanding of how to use BI tools effectively.

If you set out to produce a useful new report in some area and succeed in doing so, what incentive is there for you to make this easily shared around the company, and to help others find it? In most companies this would be pure altruism, and so people just keep the information on their hard disk, and indeed may gain kudos from the “information is power” syndrome. Overcoming such cultural barriers is hard, and few companies succeed. I should say that Accenture themselves do as good a job as anyone I have seen, where their consultants are actively tasked with documenting project lessons and storing these, with appropriate keywords, in an internal knowledge management system. However I have not seen this in other consultancies to anything like the same extent.

The other problems are all too familiar to people working in BI. Inconsistent data definitions and poor data quality are the heart of what MDM is all about, and we know how immature that is. Yet without fixing this then accurate and easy to obtain information is still elusive. A further problem which some technologies are starting to address is the sheer job of finding an existing report. Ironically there is an excellent chance that if yoiu want some partioular report, then someone else did too and has already built it. The troiuble is that may be in an Excel spreadsheet on a hard drive, or sitting on a shared server but you simply have no easy way of finding that it is there. It is ironic that Google allows us to search the whole internet in moments, yet finding a report within our own company is a much tougher proposition. Enterprise search vendors like Fast and Apptus, as well as Google itself, are beginning to apply smart technology to the problem, but here it is still early days.

Finally, most end users either don’t have access to create a new report easily, or are not trained in making best use of BI tools, or simply don’t have time to learn. This is why Excel is so popular; it is familiar and ubiquitous, and so people would rather get data into Excel and play with it there than learn a new BI tool.

I believe that these are mostly quite intractable problems, only some of which lend themselves to new and better technology. So anyone with a magic bullet e.g. “the answer is SOA” is talking nonsense. It is only by addressing the organisational, cultural and data ownership issues in combination iwth enterprise search and better tool training that a company can improve that two hours a day per person. It will be long, hard slog, and buying the latest trendy tool is not enough, whatever the salesman tells you.