SAS Update

At a conference in Lausanne in June 2014 SAS shared their current business performance and strategy. The privately held company (with just two individual shareholders) had revenues of just over $3 billion, with 5% growth. Their subscription-only license model has meant that SAS has been profitable and growing for 38 years in a row. 47% is Americas, 41% from Europe and 12% from Asia Pacific. They sell to a broad range of industries, but the largest in terms of revenue are banking at 25% and government at 14%. SAS is an unusually software-oriented company, with just 15% of revenue coming from services. Last year SAS was voted the second best company globally to work for (behind Google), and attrition is an unusually low 3.5%.

In terms of growth, fraud and security intelligence was the fastest growing area, followed by supply chain, business intelligence/visualisation and cloud-based software. Data management software revenue grew at just 7%, one of the lowest rates of growth in the product portfolio (fraud management was the fastest growing). Cloud deployment is still relatively small compared to on-premise but growing rapidly, expected to exceed over $100 million in revenue this year.

SAS has a large number of products (over 250), but gave some general update information on broad product direction. Its LASR product, introduced last year, provides in-memory analytics. They do not use an in-memory database, as they do not want to be bound to SQL. One customer example given was a retailer with 2,500 stores and 100,000 SKUs that needed to decide what merchandise to stock their stores with, and how to price locally. They used to analyse this in an eight-hour window at an aggregate level, but can now do the analysis in one hour at an individual store level, allowing more targeted store planning. The source data can be from traditional sources or from Hadoop. SAS have been working with a university to improve the user interface, starting from the UI and trying to design to that, rather than producing a software product and then adding a user interface as an afterthought.

In Hadoop, there are multiple initiatives to apply assorted versions of SQL to Hadoop from both major and minor suppliers. This is driven by the mass of skills in the market with SQL skills compared to the relatively tiny number of people that can fluently program using MapReduce. Workload management remains a major challenge in the Hadoop environment, so a lot of activity has been going on to integrate the SAS environment with Hadoop. Connection is possible via Hive QL. Moreover, SAS processing is being pushed to Hadoop with Map Reduce rather than extracting data. A SAS engine is placed on each cluster to achieve this. This includes data quality routines like address validation, directly applicable to Hadoop data with no need to export data from Hadoop. A demo was shown using the SAS Studio product to take some JSON files, do some cleansing, and then use Visual Analytics and In-Memory Statistics to analyze a block of 60,000 Yelp recommendations, blending this with another recommendation data set.

Peeking at Models

With its latest release of its data warehouse technology, Kalido has introduced an interesting new twist on business modelling. Previously in a Kalido implementation, as with a custom build warehouse, the design of the warehouse (the hierarchies, fact tables, relationships etc) was done with business users in a whiteboard-style setting. Usually the business model was captured in Visio diagrams (or perhaps Powerpoint) and then the implementation consultant would take the model and implement it in Kalido using the Kalido GUI configuration environment. There is now a new product, a visual modelling tool that is much more than a drawing tool. The new business modeller allows you to draw out relationships, but like a CASE tool (remember those?) it has rules and intelligence built into the diagrams, validating whether relationships defined in the drawing make sense and are valid or otherwise as rules are added to the model.

Once the model is developed and validated, it can be directly applied to a Kalido warehouse, and the necessary physical schemas are built (for example a single entity “Product SKU” will be implemented in staging tables, conformed dimensions and in one or many data marts) . There is no intermediate stage of definition required any more. Crucially, this means that there is no necessity to keep the design diagrams in sync with the model; the model is the warehouse, essentially. For existing Kalido customers (at least those on the latest release), the business modeller works in reverse as well: it can read an existing Kalido warehouse and generate a visual model from that. This has been tested on nine of the scariest, most complex use cases deployed at Kalido customers (in some cases these involve hundreds of business entities and extremely complex hierarchical structures), and seems to work according to early customers of the tool. Some screenshots can be seen here:

In addition to the business modeller Kalido has a tool that better automates its linkage to Business Objects and other BI tools. Kalido has for a long time had the ability to generate a Business Objects universe, a useful feature for those who deploy this BI tool, and more recently extended this to Cognos. In the new release it revamps these bridges using technology from Meta Integration. Given the underlying technology, it will now be a simple matter to extend the generation of BI metadata beyond Business Objects and Cognos to other BI tools as needed, and in principle backwards also into the ETL and data modelling world.

The 8.4 release has a lot of core data warehouse enhancements; indeed this is the largest functional release of the core technology for years. There is now automatic staging area management. This simplifies the process of source extract set-up and further minimises the need for ETL technology in Kalido deployments (Kalido always had an ELT, rather than an ETL philosophy). One neat new feature is the ability to do a “rewind” on a deployed warehouse. As a warehouse is deployed then new data is added and changes may occur to its structure (perhaps new hierarchies). Kalido’s great strength was always its memory of these events, allowing “as is” and “as was” reporting. Version 8.4 goes one step further and allows an administrator to simply roll the warehouse back to a prior date, rather as you would rewind a recording of a movie using your personal video recorder. This includes fully automated rollback of loaded data, structural changes and BI model generation. Don’t try this at home with your custom built warehouse or SAP BW.

This is a key technology release for Kalido, a company who has a track record of innovative technology that has in the past pleased its customers (I know; I used to do the customer satisfaction survey personally when I worked there) but has been let down by shifting marketing messages and patchy sales execution. An expanded US sales team now has a terrific set of technology arrows in its quiver; hopefully it will find the target better in 2008 than it has in the past.

Broadening Information Access

I saw an interesting demo today from Endeca, which bills itself as an “information access” company. Of course ever self-respecting BI company would describe itself in a similar way, but Endeca’s technology is quite different in approach from BI vendors. If you build a data warehouse and then add BI reporting to it, you quickly realise that “ad hoc” reporting by end-users is fine on the prototype with a few hundred records, but less amusing if there are a few hundred millions of records involved. Hence in real life aggregates are pre-calculated, predefined reports are carefully tuned and cubes (e.g. with Cognos Powerplay or similar) are built on common subsets of data that the users are likely to want. There is always a careful trade-off between flexibility and performance. Moreover the unstructured world or documents and emails is pretty much a separate dimension, however much in reality the context of a business transaction may be described by those emails and documents rather than what is stored in the sales order system.

Endeca has a proprietary database engine which is designed to combine both structured and unstructured data in a flexible way. The MDEX engine does not just store metadata such as hierarchies and structures, but also master data such as lists of product codes. It also indexes documents and emails from corporate systems (there are a series of adaptors with the technology). The technology makes much use of in-memory searches and caching to optimise performance. Some of the implementations can be large and complex: one deployed pensions system has 800 million records, while an electronic parts application deployed has 20,000 distinct attributes.

An example of such a system that resonated with me was a “human capital” demo which was based on the idea of a consultancy practice manager. A screen was shown allowing filtering on a range of areas e.g. consultant’s billing rates, availability, location etc. So far this looked just like the kind of thing you could prepare with a BI tool e.g. you could select consultants available in the next two weeks, with a billing rate of such and such, etc, and the list of consultants would dynamically refresh. No big deal. However the next filter was “all consultants based within x miles of Detroit”; the consultant records had been tagged with geocodes and the engine calculated distances from this information. Next a query was made to find all those who also spoke French, this information not being a database index but something buried away in the consultant’s resumes i.e. in unstructured document form. Good luck writing SQL to handle these kinds of filters!

There are plenty of situations where this mix of structured and unstructured information is important, and Endeca has prospered as a company from this dawning realisation. The company has doubled its revenue for five years in a row, and in Q4 2007 did USD 30 million in revenue, two-thirds of this in software licences. With a strong base of retail customers such as Tesco and Walmart, other verticals strongly represented include government, with customers such as the FBI, CIA and NASA, financial services e.g. ABN Amro, and manufacturing e.g. Boeing, Schlumberger. There are now enterprise 500 customers in all.

The recent acquisition of arch-competitor FAST by Microsoft demonstrates how this market is increasingly recognised as key by the industry giants. While there are plenty of competitors out there the only others in the current Gartner Leaders quadrant for this market are FAST, IBM (with Omnifind) and Autonomy, which is much more established in unstructured enterprise search. Endeca has set an impressive pace of growth, and it seems to me that there are plenty of situations in other verticals e.g. healthcare, that could suit its technology.

Finding reports, naturally

Another example of innovation in the seemingly mature world of BI can be found lurking within the unlikely setting of Progress Software (Progress acquired EasyAsk in May 2005). EasyAsk is a product which combines search capability with a natural language interface than can generate SQL to run against data warehouses. This unusual combination has led it to be used in many eCommerce sites, allowing for natural language inquiries to be translated into product offerings from web sites.

However the technology is a natural (excuse the pun) fit for a rather understated but very real problem in large organisations: actually finding existing reports or pieces of analysis. Most large companies have invested in licences of Cognos, Business Objects or other reporting and analysis software, but what happens after the initial project set-up has happened? The implementation consultants typically set up some pre-configured environments (e.g. a Business Objects universe) and perhaps a little training, and end user analysts then supposedly have at the data warehouse with glee. In reality most end users have no desire to learn a tool beyond Excel, so most rely on pre-built reports e.g. monthly sales figures, being set up for them by the IT department. A subset of end-users, typically people with “analyst” somewhere in their job title, are happy to do “ad hoc reporting”, though to be honest most of these characters could make do with a command line SQL interface rather than a fancy reporting tool if push came to shove.

The big issue is one of wasted effort due to lack of re-use. If one analyst spends a few hours coming up with a new take on sales profitability, surely this would be useful for others? Yet generally if a request comes down to produce a report, people start from scratch even if there are already perfectly good reports already produced by someone else in the company. They just do not know they are there.

This is where tools with strong search capability can help. Certainly this is not new, and Autonomy, FAST, Endeca etc can be helpful in tracking down existing information. Yet such tools are really designed for unstructured data rather than structured data. EasyAsk has the advantage that it provides end-users with the ability to do natural language queries if they don’t quite find what they need. The leading BI players have begun to realise how much of an issue this is in recent years e.g. Business Objects purchase of Inxigt. However there is plenty of room for a pure-play alternative, as this is a problem that is barely addressed in most large companies.

One complication that EasyAsk will encounter is a natural hostility in IT departments to natural language interfaces, since hoary DBA types (I started as a DBA, so can say this kind of thing) are never going to trust that a generated piece of SQL from a question like “find me the most profitable sales region” is going to get the right answer. EasyAsk addresses this concern somewhat by having subject dictionaries that are compiled with a domain expert (e.g. in HR this might equate the phrases “laid off” to “let go” to “fired” to “terminated”) in order to give its technology a better chance of formulating the right answer, and of course you can always switch on a trace to see the SQL generated to see what is going on and get it looked over by an IT type. However if a DBA has to check the SQL generated every time before approving a new report then this rather defeats the object of the exercise in the first place.

For this reason EasyAsk probably need to target end-users rather than IT departments, who will probably always be a tough crowd for them. If they can get to the right audience, then addressing the problem of making better use of all those pre-existing canned reports is a very real problem to which a large dollar value can be attached. They seem to have made an impression with customers like GSK, Forbes and BASF, and their technology is already embedded within several other companies’ applications. I recall from my days at Shell that this is a widespread issue in large companies, so exploiting existing BI investment should be a happy hunting ground for companies with the right value proposition.

Posing questions

The recent spate of acquisitions in the BI world (Cognos by IBM, Business Objects by SAP) might cause you to assume that the area was becoming mature (for which read: nothing much new to do). However there is still innovation going on. A company called Tableau, formed mainly by some ex-Stanford University people (including one who was an early employee at Pixar and who has two Oscars to his name!) has neatly combined BI software with clever use of visualisation technology. I have written before how visualisation has struggled to break out of a small niche, though there are certainly some clever technologies out there (e.g. Fractal Edge). One thing that Tableau has done well is to make a very well thought out demo of their software. Product demos are often dull affairs, but this one is very engaging (if a little frenetic), with some real thought put into the underlying data in order to show off the tool to good effect.

I still firmly believe that only a limited proportion of end users actually need a sophisticated analysis tool of any kind. In my experience of BI projects, end users generally find the leading BI tools a lot less intuitive than the vendors would like to think they are, often resorting to Excel once they have found the data they need. The type of technology that Tableau is developing provides an interesting alternative to the established players and has the potential to engage a certain subset of users more. I will follow their progress with interest.

The Other Shoe Drops

The ink is barely dry on the agreement selling Business Objects to SAP, but today a long-rumoured takeover was announced: IBM snapping up Cognos for USD 5 billion, a modest premium to its stock market valuation, at 3.5 times revenues (8 times maintenance revenues). As I wrote well over a year ago, this acquisition makes better sense than most. In particular, IBM has no proprietary application stack to defend (unlike Oracle or SAP) and so in buying Cognos it does not make things difficult for its sales force by casting doubt on application independence, in the way that the Business Objects purchase by SAP does.

I suspect there was a defensive element here too. Oracle purchased Hyperion and hence Brio, but given their acquisitive nature in recent years it was by no means clear that a another big BI purchase was out of the question. Hence IBM may have swooped quickly partly to keep Cognos out of Oracle’s hands. IBM has a superb sales channel, and so the deal is likely to be a good one for Cognos sales (and hence Cognos customers). Cognos and IBM have worked together for years, so there are no obvious technical concenrs, and the main concern will be whether Cognos staff will fit into IBM’s notorious bureaucracy.

This leaves few independent major BI vendors. SAS is privately held (most of the shares are held by one man) and so until Jim Goodnight says good night to his career, ownership of SAS is going nowhere. The same is probably true of Microstrategy, who although notionally public have a peculiar share structure making a takeover difficult. Actuate is perhaps the largest one left. However there is plenty of room out there, as shown by the vibrant performance of Qliktech.

Mr Blue Sky

The round of recent quarterly results continued with Microstrategy. We have observed a strong performance by Informatica yet a weak one from Business Objects, whose execs had cited difficult market conditions e.g. the global credit crunch, as the reason for poor demand (as distinct from any conceivable errors on their part). Indeed Informatica is in a related but distinct sector to Business Objects, so perhaps the Business Objects results were an indicator of something amiss with the business intelligence sector rather that this just being special pleading on the part of management. Microstrategy is a direct competitor to Business Objects (along with Cognos, SAS and others such as Actuate and SPSS), so its results should cast some light on the issue.

Microstrategy’s numbers were in fact what we Brits might describe as “stonking”. Licence revenue, the key health indicator of a software company, was up 23% from a year ago. Overall revenue of USD 95.8 million was also 23%, reflecting a broad-based increased in services and maintenance revenue as well as new licences. Operating margins were a tasty 29.8%, meaning that the chunky increase in revenues did not come at the cost of disproportionately increased marketing expense. This strong performance shows up the Business Objects quarterly results for what they were, a serious stumble in a sector that appears otherwise buoyant. Perhaps the Microstrategy CEO deserves his new plane after all.

The dark side of a marketing announcement

Business Objects announced a witches brew tie-up with Big Blue, bundling DB2 and its data warehouse capabilities with Business Objects together as a business intelligence “offering”. Given that Business Objects already works happily enough with DB2, it is rather unclear as to whether this is a any more a ghostly smoke and mirrors marketing tie-up rather than anything deeper, but it certainly makes some sense for both companies. It does, however, hint at Business Objects moving away from pure database platform independence, which takes on a new significance given the takeover (sorry: “merger” – much background cackling) of Business Objects by SAP. Is this really a subtle move to try and irritate Oracle, the other main DBMS vendor, give the highly competitive situation between SAP and Oracle, who are locked in a nightmare struggle for world domination? In this case, is SAP just manipulating Business Objects like a possessed puppet, pulling the strings behind the scenes, or was this just a hangover from the pre-takeover days, with the Business Objects marketing machine rolling on like a zombie that stumbles on yet does not realise it already has no independent life, clinging to some deep-held memory of those days in its old life. SAP has a more tense relationship with IBM itself these days. IBM sells cauldrons of consulting work around SAP implementations, but found a knife in its back when SAP started promoting the Netweaver middleware in direct competition with IBM’s Web Sphere.

Announcements from Business Objects from now on all need to be looked at through the distorting mirror of the relationship with its new parent, as there may be meaning lurking that would not have existed a month ago. Everything needs to be parsed for implications about the titanic Oracle v SAP struggle, as Business Objects should strive as far as possible to appear utterly neutral to applications and databases in order to not spook its customers. Arch rivals Cognos, Microstrategy and SAS will take advantage of any hint that the giant behind Business Objects is just pulling its strings.

Happy halloween everyone!

All in the timing

Business Objects Q3 results were rather soft, showing license revenue (USD 139M) just 2% up year over year. There were eight deals over USD 1 million, broadly similar to recent quarters. The business line called “information discovery and delivery” i.e. the classic reporting tools, did least well, while enterprise performance management was somewhat healthier.

However overall this is rather feeble growth (by contrast Informatica had a fine quarter, so the excuses offered by management about weak markets seem pretty lame). Perhaps there have been too many acquisitions to digest, and of course now the swallower has itself been gulped up by the much bigger fish of SAP. The price tag SAP paid look like a fairly high premium to the underlying Business of Business Objects, as reflected in its share price dip on announcement, but these results suggest that Business Objects shareholders at least can be very satisfied indeed with the price they got.

The Price of Failure

I enjoyed an article by Madan Sheina on the failure of BI projects. 87% of BI projects fail to meet expectations, according to a survey by the UK National Computing Centre. I wish I could say this was a surprise, but it is not. Any IT project involves people and processes as well as technology, yet many project focus almost entirely on the technology: tool choices, database performance etc. Yet in practice the issues which confound a BI project are rarely the technology itself. Projects fail to address actual customer needs, and frequently don’t acknowledge that there are several user constituencies out there with quite different requirements. Frequently a new technology is stuffed down the customer’s throat, and projects often neglect data quality to their peril.

From my experience, here are a few things that cause projects to go wrong.

1. Not addressing the true customer need. How much time does the project team spend with the people who are actually going to use the results of the BI project? Usually there are a subset of users who want flexible analytical tools, and others who just need a basic set of numbers once a month. A failure to realise this can alienate both main classes of user. Taking an iterative project to project development rather than a waterfall appraoch is vital to a BI project.

2. Data is only useful if it is trusted, making data quality a key issue. Most data is in a shocking state in large companies, and the problems often come to light only when data is brought together and summarised. The BI project cannot just gloss over this, as the customers will quickly avoiding using the new shiny system if they find they cannot trust the data within it. For this reason the project teams needs to encourage the setting up of data governance processes to ensure that data quality improves Such initiatives are often outside the project scope, are unfunded and require business buy-in, which is hard. The business people themselves often regard poor data quality as an IT problem when in fact it is an ownership and business process problem.

3. “Just one more new user interface” is not what the customer wants to hear. “Most are familiar with Excel and are not willing to change their business experience” was one quote from a customer in the article. Spot on! Why should a customer whose main job is, after all, not IT but something in the business, have to learn a different tool just to get access to data that he or she needs? Some tool vendors have done a good job of integrating with Excel, and yet are often in denial about this since they view their proprietary interface as a key competitive weapon against other vendors. Customers don’t care about this; they just want to get at the data they need to do their job on an easy and timely way. Hence a BI project should, if at all possible, look at ways of allowing users to getting data into their familiar Excel rather than foisting new interfaces on them. A few analyst types will be prepared to learn a new tool, but this is only a small subset of the audience for a BI project, likely 10% or less.

4. Data being out of date, and the underlying warehouse being unable to respond to business change, is a regular problem. Traditional data warehouse designs are poor at dealing with change caused by reorganisations, acquisitions etc, and delays in responding to business change cause user distrust. Unchecked, this causes users to hire a contractor to get some “quick and dirty” answers into a spreadsheet and bypass the new system, causing the proliferation of data sources to continue. Using packaged data warehouse that are good at dealing with business change is a good way of minimising this issue, yet even today most data warehouse are hand-built.

5. Training on a new application is frequently neglected in IT projects. Spend the time to sit down with busy users and explain to them how they are to access the data in the new system, and make sure that they fully understood how to use the system. It is worth going to some trouble to sit down with users and train them one to one if you have to, since it only takes a few grumbling voices to sow the seeds of discontent about a new system. Training the end users is never seen as a priority for a project budget, yet this can make the world of difference to the likelihood of a project succeeding.

6. Running smaller projects sounds crass but can really help. Project management theory shows that the size of a project is the single biggest predictor of success: basically, if projects fail, small ones do better, and yet you still see USD 100 million “big bang” BI projects. Split the thing into phases, roll out by department and country, do just about anything to bring the project down to a manageable size. If your BI project has 50 people or more on it then you are already in trouble.

7. Developing a proper business case for a project and then going back later and doing a post implementation review happens surprisingly rarely, yet can help shield the project from political ill winds.

You will notice that not one of the above issues involves a choice of technology, technical performance or the mention of the word “appliance”. Yes, it is certainly important to pick the right tool for the job, to choose a sufficiently powerful database and server and to ensure adequate systems performance (which these days appliances can help with in the case of very large data volumes). The problem is that BI projects tend to gloss over the “soft” issues above and concentrate on the “hard” technical issues that we people who work in IT feel comfortable with. Unfortunately there is no point in having a shiny new server and data warehouse if no one is using it.