Psst – Wanna buy a Data Quality Vendor?

Founded in 1993, Trillium Software has been the largest independent data quality vendor for some years, nestling since the late 1990s as a subsidiary of US marketing services company Harte Hanks. The latter was once a newspaper company dating back to 1928, but switched to direct marketing in the late 1990s. It had overall revenues of $495 million in 2015. There was clearly a link between data quality and direct marketing, since name and address validation is an important feature of marketing campaigns. However the business model of a software company is different from a marketing firm, so ultimately there was always going to be a certain awkwardness in Trillium living under the Harte Hanks umbrella.

On June 7th 2016 the parent company announced that it had hired an advisor to look at “strategic alternatives” for Trillium, including the possibility of selling the company, though the company’s announcement made clear that a sale was not a certainty. Trillium has around 200 employees and a large existing customer base, so will have a steady income stream from maintenance revenues. The data quality industry is not the fastest growing sector of enterprise software, but is well established and quite fragmented. As well as offerings from Informatica, IBM, SAP and Oracle (all of which were based on acquisitions) there are dozens of smaller data quality vendors, many of them having grown up around the name and address matching issue that is well suited to at least a partially automated solution. While some vendors like Experian have focused traditionally on this problem, other vendors such as Trillium have developed much broader data quality offerings, with functions such as data profiling, cleansing, merge/matching, enrichment and even data governance.

There is a close relationship between data quality and the somewhat faster growing sector of master data management (MDM), so MDM vendors might seem in principle to be natural acquirers of data quality vendors. However MDM itself has somewhat consolidated in recent years, and the big players in it like Informatica, Oracle and IBM all market platforms that combine data integration, MDM and data quality (though in practice the degree of true integration is distinctly more variable than it appears on Powerpoint). Trillium might be too big a company to be swallowed up by the relatively small independents that remain in the MDM space. It will be interesting to see what emerges from this exercise. Certainly it makes sense for Trillium to stand on its own to feet rather than living within a marketing company, but on the other hand Harte Hanks may have missed the boat. A few years ago large vendors were clamouring to acquire MDM and related technologies, but now most companies that need a data quality offering have either built or bought one. The financial adviser in charge of the review may have to be somewhat creative in who it looks at as a possible acquirer.

Informatica V10 emerges

Informatica just announced their Big Data Management solution V10, the latest update to their flagship suite of technology. The key objective for this is to enable customers to design data architectures that can accommodate both traditional database sources and newer Big Data “lakes” without needing to get swim too deeply in the world of MapReduce or Spark.

In particular, the Live Data Map offering is interesting, a tool that builds a metadata catalog as automatically as it can. Crucially, this is updated continuously rather than being a one-off batch exercise, the bane of previous metadata efforts, which can quickly get out of date. It analyses not just database system tables but also semantics and usage, so promises to chart a path through the complexity of today’s data management landscape without need for whiteboards and data model diagrams.

V10 extends the company’s already fairly comprehensive ability to plug into a wide range of data sources, with over 100 pre-built transformations and over 200 connectors. By providing a layer of interface above the systems management level, a customer can gain a level of insulation from the rapidly changing world of Big Data, with its bewildering menagerie of technologies, some of which disappear from current fashion almost as soon as you have figured out where they fit. Presenting a common interface across traditional and new data sources enables organisations to minimise wasted skills investment.

As well as quite new features such as Live Data Map, there are an array of incremental updates to the established technology elements of the Informatica suite, such as improved collaboration capability within the data quality suite, and the ability of the data integration hub to span both cloud and on-premise data flows. A major emphasis of the latest release is performance improvement, with much faster data import and data cleansing.

With Informatica having recently gone private, it will be comforting for their customers that the company is investing as much as ever in its core technology, as well as adding new and potentially very useful new elements. The data management landscape is increasingly fragmented and complex these days, so hard pressed data architects need all the help that they can get.

Leaving Las Vegas

The Informatica World 2015 event in Las Vegas was held as the company was in the process of being taken off the stock market and into private ownership by private equity firm Permira and a Canadian pension fund. The company was still in its quiet period so was unable to offer any real detail about this. However my perception is that one key reason for the change may be that the company executives can see that there is a growing industry momentum towards cloud computing. This is a challenge to all major vendors with large installed bases, because the subscription pricing model associated with the cloud presents a considerable challenge as to how vendors will actually make money compared to their current on-premise business model. A quick look at the finances of publicly held cloud-only companies suggest that even these specialists have yet to really figure it out, with a sea of red ink in the accounts of most. If Informatica is to embrace this change then it is likely that it’s profitability will suffer, and private investors may offer a more patient perspective than Wall Street, which is notoriously focused on short-term earnings. It would seem to me that there is unlikely to be any real change of emphasis around MDM from Informatica, given that it seems to be their fastest growing business line.

On the specifics of the conference, there were announcements for the company around its major products, including its recent foray into data security. The most intriguing was the prospect of a yet to be delivered product called “live data map”. The idea is to allow semantic discovery within corporate data, and allow end-users to vote on how reliable particular corporate data elements are, rather as consumers vote for movies on IMDB or rate others on eBay. The idea is that this approach may be particularly useful as companies have to deal with “data lakes” where data will have little or none of the validation applied to it that would (in theory) be the case with current corporate systems. The idea is tantalising but this was a statement of direction rather than a product that was ready for market.

The thing that I found most useful was the array of customer presentations, over a hundred in all. BP gave an interesting talk about data quality in the upstream oil industry, which has typically not been a big focus for data quality vendors (there is no name and address validation in the upstream). Data governance was a common theme in several presentations, clearly key to the success of both master data and data quality projects. There was a particularly impressive presentation by GE Aviation about their master data project, which had to deal with very complex aeroplane engine data.

Overall, Informatica’s going private should not have any negative impact on customers, at least unless its executives end up taking their eye off the ball due to the inevitable distractions associated with new ownership.

The Teradata Universe

The Teradata Universe conference in Amsterdam in April 2015 was particularly popular, with a record 1,200 attendees this year. Teradata always scores unusually high in our customer satisfaction surveys, and a recurring theme is its ease of maintenance compared to other databases. At this conference the main announcement continued this theme with the expansion of its QueryGrid, allowing a common administrative platform across a range of technologies. QueryGrid can now manage all three major Hadoop implementations, MapR, Cloudera and HortonWorks, as well as its own Aster and Teradata platforms. In addition the company announced a new appliance, the high-end 2800, as well as a new feature they call the software-defined warehouse. This allows multiple Teradata data warehouses to be managed as one logical warehouse, including allow security management across multiple instances.

The conference had its usual heavy line-up of customer project implementation stories, such as an interesting one by Volvo, who are doing some innovative work with software in their cars, at least in the prototype stage. For example in one case the car sends signals to any cyclists with a suitably equipped helmet, using a proximity alert. In another example the car can seek out spare parking spaces in a suitably equipped car park. A Volvo now has 150 computers in it, generating a lot of data that has to be managed as well as creating new opportunities. Tesla is perhaps the most extreme example so far of cars becoming software-drive, in their case literally allowing remote software upgrades in the same way that occur with desktop computers (though hopefully car manufacturers will do a tad more testing than Microsoft in this regard). The most entertaining speech thatI saw was by a Swedish academic, Hans Rosling, who advises UNICEF and the WHO and who gave a brilliant talk about the world’s population trends using extremely advanced visualisation aids, an excellent example of how to display big data in a meaningful way.

SAS Update

At a conference in Lausanne in June 2014 SAS shared their current business performance and strategy. The privately held company (with just two individual shareholders) had revenues of just over $3 billion, with 5% growth. Their subscription-only license model has meant that SAS has been profitable and growing for 38 years in a row. 47% is Americas, 41% from Europe and 12% from Asia Pacific. They sell to a broad range of industries, but the largest in terms of revenue are banking at 25% and government at 14%. SAS is an unusually software-oriented company, with just 15% of revenue coming from services. Last year SAS was voted the second best company globally to work for (behind Google), and attrition is an unusually low 3.5%.

In terms of growth, fraud and security intelligence was the fastest growing area, followed by supply chain, business intelligence/visualisation and cloud-based software. Data management software revenue grew at just 7%, one of the lowest rates of growth in the product portfolio (fraud management was the fastest growing). Cloud deployment is still relatively small compared to on-premise but growing rapidly, expected to exceed over $100 million in revenue this year.

SAS has a large number of products (over 250), but gave some general update information on broad product direction. Its LASR product, introduced last year, provides in-memory analytics. They do not use an in-memory database, as they do not want to be bound to SQL. One customer example given was a retailer with 2,500 stores and 100,000 SKUs that needed to decide what merchandise to stock their stores with, and how to price locally. They used to analyse this in an eight-hour window at an aggregate level, but can now do the analysis in one hour at an individual store level, allowing more targeted store planning. The source data can be from traditional sources or from Hadoop. SAS have been working with a university to improve the user interface, starting from the UI and trying to design to that, rather than producing a software product and then adding a user interface as an afterthought.

In Hadoop, there are multiple initiatives to apply assorted versions of SQL to Hadoop from both major and minor suppliers. This is driven by the mass of skills in the market with SQL skills compared to the relatively tiny number of people that can fluently program using MapReduce. Workload management remains a major challenge in the Hadoop environment, so a lot of activity has been going on to integrate the SAS environment with Hadoop. Connection is possible via Hive QL. Moreover, SAS processing is being pushed to Hadoop with Map Reduce rather than extracting data. A SAS engine is placed on each cluster to achieve this. This includes data quality routines like address validation, directly applicable to Hadoop data with no need to export data from Hadoop. A demo was shown using the SAS Studio product to take some JSON files, do some cleansing, and then use Visual Analytics and In-Memory Statistics to analyze a block of 60,000 Yelp recommendations, blending this with another recommendation data set.

Kalido changes hands

Yesterday Kalido, the data warehouse and MDM company, changed owners. Rather than an acquisition by a software company, the buyer was an investment company called Silverback, a Texas company backed by a VC called Austin Ventures. The company specialises in purchasing software companies in related groups, building the businesses into something greater than the original parts. It has recently done this with a series of project management-related acquisitions in the form of Upland Software. In this context, presumably Kalido will be combined with Noetix, an analytics company in their portfolio, perhaps with something else to follow. At first glance, the synergy here looks limited, but we shall see. It would make sense if acquisitions in the areas of data quality and perhaps data integration followed, allowing a broader platform-based message around master data.

As someone with a personal interest in the company (I founded it, but left in 2006 when it moved its management to the USA) it is a little sad to see Kalido not achieve greater things than it has in the market, at least up until now. It was perhaps a bit ahead of its time, and had technology features back in 1996 that are only now appearing in (some) current competitors: time variance and the management of federations of hub instances being key examples. The marketing messaging and sales execution never matched the technology, though the company has nevertheless built up an impressive portfolio of global customers, which remain a considerable asset. Hopefully the new backers will invigorate the company, though to do this a key indicator will be whether they manage to lock in and motivate key technical staff. If this happens, and genuinely synergistic acquisitions follow, then perhaps the company’s technology will gain the wider audience that it deserves.

Informatica’s MDM strategy

I recently spent a couple of days with the management of Informatica at the Rosewood Hotel in Palo Alto. The company sees a lot of potential in the notionally rather mature area of data integration, with hand-coding still the norm in many companies, especially in less developed markets such as China, Russia and Mexico. From an MDM viewpoint, In 2012 one third of the revenue was part of a broader deal, with the company claiming a doubling of customer logos. Informatica’s MDM offering is based on two acquisitions, Siperian and now Helier. Siperian was also noted for its good scalability for customer data, and a recent customer win at HP illustrates that, the application dealing with 1.5 billion customer records, and handling 37,000 users.

The Heiler acquisition is still technically not complete (German securities rules in such things moves slowly) but it was evident that the Heiler staff were already working in concert with Informatica. Heiler itself grew 29% in 2012, showing a growth spurt in Q4 after the acquisition was announced. Informatica had for some time claimed that their MDM offering was multi-domain, but in reality most customer examples were based on customer data, and heavily skewed towards North America. The purchase of European PIM vendor Heiler gives more balance to this picture, and in time one would expect to see the separate MDM hubs sharing metadata etc. Informatica actually has a quite good story around managing multiple MDM hubs, but this is one that it has been quiet about, perhaps not perceiving much demand, yet its capabilities e.g. in data masking, are useful in such contexts and should enable it to do a better job than many in a federated environment. For mufti-national companies managing a federation of MDM hubs will be the reality, but the MDM market has been in denial about this. To me there is an opportunity here for any vendor that can clearly articulate a federated vision.

Informatica has clearly embraced MDM as a core technology, and indeed this make sense given the higher growth rates in the MDM market than in its traditional integration market.

Informatica goes shopping

Informatica buys Heiler

Informatica has made an offer to buy Germany PIM vendor Heiler – the deal has not gone through yet and the German securities laws are complex, but it appears to be a “friendly” takeover. There are a few interesting aspects to this. Firstly, it sets a useful valuation benchmark. Heiler did 17.4 million Euros in revenue in their last financial year, and the offer is 80.8 million, so this is a price to sales ratio of 4.6, a healthy though not extreme valuation (Heiler also has 15.8 million euros of cash and is modestly profitable, with profits in the last financial year of 1.4 million Euros). It had been around in the MDM market for 12 years, and so is quite a mature product/company, shown in the split of its revenue, with nearly half its revenue in services, and a fifth in maintenance revenue, with several hundred customers.

The deal makes sense to Heiler, as Informatica has a far more powerful sales channel. From Informatica’s perspective they gain a solid piece of technology with a proven footprint in the product data domain, whereas Informatica, for all its multi-domain marketing, has been primarily used to managed customer data. They also gain a slice of the European MDM market, reducing their heavy US revenue preponderance. Moreover, assuming the deal goes ahead, Informatica now has several hundred new customers to up-sell its other software to e.g. its integration and data quality offerings.

The deal also shows that the M&A market is still active for MDM software, which is positive news for the shareholders of other independent MDM vendors out there.

SOA and how to run a conference

I am currently at the SAP Teched conference in Berlin. I will write in a separate publication about the forthcoming version 7.1 of SAP MDM, but have a couple of quite separate observations to mention here. The first is a confirmation of what i have long believed:that going towards an SOA world is going to be very hard work. One customer here, Volkswagen Financial Services, described an ambitious project where they have taken a part of their business, which deals with fleet car hire, and moved wholesale to an SOA-based infrastructure. This project has been live a few months and is already showing some genuine benefits compared to the rather manually intensive system they had before, in terms of faster processing time for certain common business processes (which used to involve agents dealing with multiple applications) and in terms of improved data quality. However it is interesting that no formal cost/benefit analysis appears to have been done. Moreover this project, which involved 100 IT staff and 50 business people, took over five years to complete. I do not think this is much to do with the technology, but rather the sheer complexity of taking a cross functional view, involving different business lines agreeing on common terminology and data definitions, agreeing on the way in which the many new web services behave. There has also been a lot of change management needed to effectively get the front-line business users to accept the new system, which automates many tasks that they used to have direct control of.

I suspect that few companies have been quite so aggressive in their move to SOA as VW. A more typical conversation was with a gentleman at a German utility and resources company, who have been looking actively into SOA since 2006. They are only just putting their toe in the water now, putting in a very limited project with just a handful of web services, across a single process, in just one small subsidiary of their organisation. Even this limited pilot has not been entirely without its issues. One problem which has reared its head is how much more difficult it is to do debugging across a web services application which touches a whole series of different applications in its wake. If something goes wrong, then they have found it is a lot more fiddly to trace where exactly the fault lies, given the cross-application nature of the project. Again, this is a project driven by the IT department as an exercise in proving technology, rather than one with a quantified business case. I do not pretend that a few random conversations at a conference is a remotely scientific sample, but it seems clear that SOA is far from mainstream in many companies thus far, and that there are new issues to address compared to traditional applications. Not least of these is the need to sort out common master data definitions across the multiple applications affected.

On a separate note, those who read my blog regularly will know that a bugbear of mine is conferences that do not run on time or are disorganised – yes ETRE, that means you. By contrast, this conference is a testament to stereotypical Teutonic efficiency. Sessions start on time to the minute, and finish on time, to the minute. There are plenty of staff around to guide people around the large congress centre, and the pre-conference administration was exemplary. When I arrived I was handed not just a conference schedule, but a suggested set of lectures and meetings that were likely to be of interest specifically to me based on my MDM interests. If only all conferences could be run by Germans.

A speedy investment

In recent years venture capital firm have generally shunned enterprise software companies, so it was interesting to see start-up expressor (no, this is not a typo) doing a USD 10 million round this week. The company has genuinely interesting data integration technology, and in a future release plans to add significant data quality functionality. Its use of parallelism enables it, in principle, to compete at the high end of ETL.

It is good to see venture firms dipping their toes back in the water of innovative enterprise software companies. A couple of years ago I came across what I thought was an interesting data quality company called Zoomix. I introduced them to a prestigious venture firm, who were entirely uninterested, at the time chasing after ever more trendy social networking websites (the company was in fact bought by Microsoft a few months ago, which would have netted a pretty decent return for the investors). Although the enterprise software sector is not exactly booming, there is still room for astute investments in differentiated technologies.