The Teradata Universe

The Teradata Universe conference in Amsterdam in April 2015 was particularly popular, with a record 1,200 attendees this year. Teradata always scores unusually high in our customer satisfaction surveys, and a recurring theme is its ease of maintenance compared to other databases. At this conference the main announcement continued this theme with the expansion of its QueryGrid, allowing a common administrative platform across a range of technologies. QueryGrid can now manage all three major Hadoop implementations, MapR, Cloudera and HortonWorks, as well as its own Aster and Teradata platforms. In addition the company announced a new appliance, the high-end 2800, as well as a new feature they call the software-defined warehouse. This allows multiple Teradata data warehouses to be managed as one logical warehouse, including allow security management across multiple instances.

The conference had its usual heavy line-up of customer project implementation stories, such as an interesting one by Volvo, who are doing some innovative work with software in their cars, at least in the prototype stage. For example in one case the car sends signals to any cyclists with a suitably equipped helmet, using a proximity alert. In another example the car can seek out spare parking spaces in a suitably equipped car park. A Volvo now has 150 computers in it, generating a lot of data that has to be managed as well as creating new opportunities. Tesla is perhaps the most extreme example so far of cars becoming software-drive, in their case literally allowing remote software upgrades in the same way that occur with desktop computers (though hopefully car manufacturers will do a tad more testing than Microsoft in this regard). The most entertaining speech thatI saw was by a Swedish academic, Hans Rosling, who advises UNICEF and the WHO and who gave a brilliant talk about the world’s population trends using extremely advanced visualisation aids, an excellent example of how to display big data in a meaningful way.

SAS Update

At a conference in Lausanne in June 2014 SAS shared their current business performance and strategy. The privately held company (with just two individual shareholders) had revenues of just over $3 billion, with 5% growth. Their subscription-only license model has meant that SAS has been profitable and growing for 38 years in a row. 47% is Americas, 41% from Europe and 12% from Asia Pacific. They sell to a broad range of industries, but the largest in terms of revenue are banking at 25% and government at 14%. SAS is an unusually software-oriented company, with just 15% of revenue coming from services. Last year SAS was voted the second best company globally to work for (behind Google), and attrition is an unusually low 3.5%.

In terms of growth, fraud and security intelligence was the fastest growing area, followed by supply chain, business intelligence/visualisation and cloud-based software. Data management software revenue grew at just 7%, one of the lowest rates of growth in the product portfolio (fraud management was the fastest growing). Cloud deployment is still relatively small compared to on-premise but growing rapidly, expected to exceed over $100 million in revenue this year.

SAS has a large number of products (over 250), but gave some general update information on broad product direction. Its LASR product, introduced last year, provides in-memory analytics. They do not use an in-memory database, as they do not want to be bound to SQL. One customer example given was a retailer with 2,500 stores and 100,000 SKUs that needed to decide what merchandise to stock their stores with, and how to price locally. They used to analyse this in an eight-hour window at an aggregate level, but can now do the analysis in one hour at an individual store level, allowing more targeted store planning. The source data can be from traditional sources or from Hadoop. SAS have been working with a university to improve the user interface, starting from the UI and trying to design to that, rather than producing a software product and then adding a user interface as an afterthought.

In Hadoop, there are multiple initiatives to apply assorted versions of SQL to Hadoop from both major and minor suppliers. This is driven by the mass of skills in the market with SQL skills compared to the relatively tiny number of people that can fluently program using MapReduce. Workload management remains a major challenge in the Hadoop environment, so a lot of activity has been going on to integrate the SAS environment with Hadoop. Connection is possible via Hive QL. Moreover, SAS processing is being pushed to Hadoop with Map Reduce rather than extracting data. A SAS engine is placed on each cluster to achieve this. This includes data quality routines like address validation, directly applicable to Hadoop data with no need to export data from Hadoop. A demo was shown using the SAS Studio product to take some JSON files, do some cleansing, and then use Visual Analytics and In-Memory Statistics to analyze a block of 60,000 Yelp recommendations, blending this with another recommendation data set.

Teradata Announcements

I recently attended a Teradata conference in Prague. In our regular Landscape research The Information Difference consistently find that Teradata has some of the happiest customers of any data warehouse vendor. In the last four years in a row their customers have been in the top two spots in our survey for overall highest satisfaction. Moreover, this is based on a large sample of customers. This hard survey data is backed up by anecdotal discussion at their events.

At the recent conference Teradata made three significant announcements. At present their architecture encompasses three technical platforms: the traditional relational, the analytical database they acquired via Aster, and Hadoop, where they have partnered with Hortonworks. Their approach is to layer their software around these platforms, allowing customers to deploy on whichever combination is most appropriate. The Teradata Querygrid allows a single SQL query to be orchestrated across systems without moving the data. Certainly as a concept this will be appealing to many customers.

It also announced the Active Enterprise Data Warehouse 6750 platform, aimed at the highest end use cases, claiming to be able to handle up to 61 petabytes of data. Certainly Teradata has dozens of customers in its “petabyte club”, so its on-going investment here will be welcome to those with the ultra-high volumes of data. The core database itself received an upgrade in the form of Teradata Database 15, which allows users to run analytical queries across multiple systems as well as run non-SQL languages within the database, and supports JSON (the low overhead alternative to XML) data. This last is aimed at the increasingly important area of sensor data and embedded processor data.

Overall, Teradata continues to be a major player at the high end of the data warehouse market. It has actively embraced newer technologies e.g. the multi-processing columnar approach of Aster, and more recently with Hadoop, going well beyond paying lip service to the newer analytic approaches. Customers with especially demanding workloads should certainly consider its capabilities.

Kalido changes hands

Yesterday Kalido, the data warehouse and MDM company, changed owners. Rather than an acquisition by a software company, the buyer was an investment company called Silverback, a Texas company backed by a VC called Austin Ventures. The company specialises in purchasing software companies in related groups, building the businesses into something greater than the original parts. It has recently done this with a series of project management-related acquisitions in the form of Upland Software. In this context, presumably Kalido will be combined with Noetix, an analytics company in their portfolio, perhaps with something else to follow. At first glance, the synergy here looks limited, but we shall see. It would make sense if acquisitions in the areas of data quality and perhaps data integration followed, allowing a broader platform-based message around master data.

As someone with a personal interest in the company (I founded it, but left in 2006 when it moved its management to the USA) it is a little sad to see Kalido not achieve greater things than it has in the market, at least up until now. It was perhaps a bit ahead of its time, and had technology features back in 1996 that are only now appearing in (some) current competitors: time variance and the management of federations of hub instances being key examples. The marketing messaging and sales execution never matched the technology, though the company has nevertheless built up an impressive portfolio of global customers, which remain a considerable asset. Hopefully the new backers will invigorate the company, though to do this a key indicator will be whether they manage to lock in and motivate key technical staff. If this happens, and genuinely synergistic acquisitions follow, then perhaps the company’s technology will gain the wider audience that it deserves.

Teradata Goes Shopping

The recent flurry of acquisition activity in the data warehouse appliance space continued today as Teradata purchased Aster Data. HP’s purchase of Vertica, IBM’s of Netezza, EMC of Greenplum and (less recently) Microsoft of Data Allegro underscore the fact that demand for high performance analytic databases is perceived to be strong by the industry giants. At first glance this may seem an odd buy for Teradata, itself the original appliance vendor, but Aster in fact occupied a very particular niche in the market.

Aster’s strengths (and its intellectual propertty, patent pending) were around its support for intergated MapReduce analytics. MapReduce is the distributed computing framework pioneeed by Google, which inspired the open-source framework Hadoop. This framework is suited to highly compute-intensive analytics, particularly of high volumes of unstructured data. This includes use cases like fraud analysis, but has found a particular niche in social networking websites, who have to deal with vast and rapdily increasing volumes of data. Certain analytic queries such as social network graph analysis, signal analysis, network analysis and some time series analysis are awkward for conventional SQL, involving self-joins and potentially multiple passes through a database, which is a big deal if the database is hundreds of terabytes in size. The MapReduce appoach can offer significant perfomance advantages for such use cases, though it typically requires specialist programming knowledge.

Aster’s customers included companies like LinkedIn and FullTilt Poker,and its SQL-MR technology had a good reputation in such situations. Aster was a relatively small company, so this purchase is loose change for Teradata but buys it a jump-start into this fashionable area of analytic processing. Aster of course gains access to the channels and deep pockets of Teradata. Conservative buyers may have been unwilling to jump into these waters with a start-up but will be reassured by the legitimisation of the technology by a big software brand. Hence it seems like a win-win for both companies.

This leaves very few stand-alone independent data warehouse vendors: ParAccel, Kognitio and more obscure players like Exasol and Calpont can continue to plough an independent path, but I suspect that this will not be the last acquisition we will see in this market.

HP has a Vertica strategy

Having recently abandoned its Neoview offering, HP today revealed its plans in the data warehouse market by purchasing Vertica. Vertica is one of a clutch of data warehouse vendors that has apeared in recent years, employing MPP architecture and usually a specialist database structure in order to achieve fast analytic performance on large volumes of data. In Vertica’s case it uses a columnar database (of the style pioneered by Sybase), but in this case combined with MPP. This combination works well for many analytic use cases, and a well executed sales strategy based around this has meant that Vertica has achieved consderable market momentum compared to many of its competitors, building up a solid roster of customers such as Comcast, Verizon and Twitter.

In principle HP’s vast sales channel should be very beneficial to spreading the Vertica technology further. Nervous buyers need no longer be anxious about buying from a start-up, and HP clearly has a vast marketing channel. Yet success is far from guaranteed, as HP’s previous debacle with its Neoview data warehouse offering showed. Now at least HP has a proven modern data warehouse offering with traction in the market. It remains to be seen whether it can exploit this advantage.

Appliances and ETL

ELT may be better than ETL for appliances

I attended some interesting customer sessions at the Netezza user group in London yesterday, following some other good customer case studies at the Teradata conference in the rather sunnier climes of San Diego. Once common thread that came out from some sessions was the way that the use of appliances changes the way in which companies treat ETL processing. Traditionally a lot of work has gone into taking the various source systems for the warehouse. defining rules as to how this data into be converted into a common format, then using an ETL tool (like Informatica or Ab Initio etc) to carry out this pre-processing before presenting a neatly formatted file in consistent form to be loaded into a warehouse.

When you have many terabytes of data then this pre-processing in itself can become a bottleneck. Several of the customers I listened to at these conferences had found it more efficient to move from ETL to ELT. In other words they load essentially raw source data (possibly with some data quality checking only) into a staging area in the warehouse appliance, and then write SQL to carry out the transformations within the appliance before loading up into production warehouse tables. This allows them to take advantage of the power of the MPP boxes they have purchase for the warehouse, which are typically more efficient and powerful than using regular servers that their ETL tools run on. This does not usually eliminate the need for the ETL tool (though one customer did explain how they had switched off some ETL licences) but means that much more processing is carried out in the data warehouse itself.

Back in my Kalido days we found it useful to take this ELT approach too, but for different reasons. It was cleaner to do the transformations based on business rules stored in the Kalido business model, rather than having the transformations buried away in ETL scripts, meaning more transparent rules and so lower support effort. However I had not appreciated that the sheer horsepower available in data warehouse appliances suits ELT for pure performance reasons. Have others found the same experience on their projects? If so then post a comment here.

Low Hanging Fruit

EMC has entered the data warehouse appliance market via the purchase of Greenplum, who made good progress in the space with their massively parallel database (based on PostgresSQL). Greenplum had impressive reference customers and some of the highest end references out there in terms of sheer scale of data. They had also been one of the few vendors (Aster is another) that went early into embracing MapReduce, a framework designed for paralleism and suitable for certain types of complex processing.

Data warehouse appliances can be a tough sell because conservative buyers are nevous of making major purchases from saller comapnies, but the EMC brand will remove that concern. Also, EMC have a vast existing customer base and the sales channel that can exploit this. Seems like a sensible move to me.

No Data Utopia

The data warehouse appliance market has become very crowded in the last couple of years, in the wake of the success of Netezza, which has drawn in plenty of venture money to new entrants. The awkwardly named Dataupia had been struggling for some time, with large-scale redundancies early in 2009, but now appears to have pretty much given up the ghost, with its assets being put up for sale by the investors.

If nothing else, this demonstrates that you need to have a clearly differentiated position in such a crowded market, and clearly in this case the sales and marketing execution could not match the promise of the technology. However it would be a mistake to thing that all is doom and gloom for appliance vendors, as the continuing recent commercial success of Vertica demonstrates.

To me, something that vendors should focus on is how to simplify migration off an existing relational platform. If you have an under-performing or costly data warehouse, then an appliance (which implies “plug and play”) sound appealing. However although appliance vendors support standard SQL, it is another thing to try and migrate a real-life database application, which may have masses of proprietary application logic locked up in stored procedures, triggers and the like. This would seem to me the thing that is likely to hold back buyers, but many vendors seem to focus entirely on their price/performance characteristics in their messaging. It actually does not matter if a new appliance has 10 times better price performance (let’s say, saving you half a million dollars a year) if it takes several times that to actually migrate the application. Of course there are always green-field applications, but if someone could devise a way of dramatically easing migration effort from an existing relational platform then it seems to me that they would have cracked the code on how to sell to end-users in large numbers. Ironically, this was just the kind of claim that Dataupia made, which suggests that there was a gap between its claims and its ability to convince the market that it was really that easy, despite accumulating a number of named customer testimonials on its web-site.

Even having the founder of Netezza (Foster Hinshaw) did not translate into commercial viability, despite the company attracting plenty of venture capital money. The company has no shortage of marketing collateral; indeed a number of industry experts who have authored glowing white-papers on the Dataupia website may be feeling a little sheepish right now. Sales execution appears to have been a tougher nut to crack. I never saw the technology in action, but history tells us that plenty of good technology can fail in the market (proud owners of Betamax video recorders can testify to that).

If anyone knows more about the inside story here then feel free to contact me privately or post a comment.

Clearing a migration path

One of the issues often underestimated by new vendors attacking an entrenched competitor is the sheer cost of platform migration. For example, in the database world, if someone comes out with a new, shiny DBMS that is faster and cheaper than the incumbents, why would customers not just switch? After all the new database is ANSI compliant and so is the old one, right? Of course this view may look good in a glossy article in a magazine or the fevered fantasies of a software sales person, but in reality enterprises have considerable switching costs for installed technology. In the case of databases, SQL is just a small part of the story. There are all the proprietary database extensions (stored procedures, triggers etc), all the data definition language scripts, systems tables with assorted business rules implicitly encoded, and he invested time and experience of the database administrators, a naturally conservative bunch. I know as I was a DBA a long time ago; there is nothing like the prospect of being phoned up in the middle of the night to be told the payroll database is down and asked how many minutes will it take you bring it back up, to give you a sceptical perspective on life. New and exciting technology is all well and good, but if that involves rewriting a large suite of production batch jobs that you have just spent months getting settled, you tend to just push that brochure back across the table of the software sales person. Installed enterprise software is notoriously “sticky”.

Hence when attacking something that is already in production, you have to go further than just say “ours is x times cheaper and faster”. An example of this is with DATAllegro and their assault on the mountainous summit that is the Teradata installed base. They have just, very sensibly, brought out a suite of tools that will actually help convert and existing Teradata account, rather than just hoping someone is going to buy into the speed and cost story. This new suite of utilities will:

– convert BTEQ production jobs with the DATAllegro DASQL batch client
– convert DDL from Teradata to DATAllegro DDL
– connect to the Teradata environment and extract table structures (schema) and data and import them into DATAllegro.

This is right approach to take. What they need to do next is get some public customer stories who have actually been through this conversion, and get them to talk about the benefits, and also realistically about the effort involved. If they can do that then they will be in a credible position to start eating away at the Teradata crown jewels, the seriously high end databases with 100TB or more.