Appliances and ETL

ELT may be better than ETL for appliances

I attended some interesting customer sessions at the Netezza user group in London yesterday, following some other good customer case studies at the Teradata conference in the rather sunnier climes of San Diego. Once common thread that came out from some sessions was the way that the use of appliances changes the way in which companies treat ETL processing. Traditionally a lot of work has gone into taking the various source systems for the warehouse. defining rules as to how this data into be converted into a common format, then using an ETL tool (like Informatica or Ab Initio etc) to carry out this pre-processing before presenting a neatly formatted file in consistent form to be loaded into a warehouse.

When you have many terabytes of data then this pre-processing in itself can become a bottleneck. Several of the customers I listened to at these conferences had found it more efficient to move from ETL to ELT. In other words they load essentially raw source data (possibly with some data quality checking only) into a staging area in the warehouse appliance, and then write SQL to carry out the transformations within the appliance before loading up into production warehouse tables. This allows them to take advantage of the power of the MPP boxes they have purchase for the warehouse, which are typically more efficient and powerful than using regular servers that their ETL tools run on. This does not usually eliminate the need for the ETL tool (though one customer did explain how they had switched off some ETL licences) but means that much more processing is carried out in the data warehouse itself.

Back in my Kalido days we found it useful to take this ELT approach too, but for different reasons. It was cleaner to do the transformations based on business rules stored in the Kalido business model, rather than having the transformations buried away in ETL scripts, meaning more transparent rules and so lower support effort. However I had not appreciated that the sheer horsepower available in data warehouse appliances suits ELT for pure performance reasons. Have others found the same experience on their projects? If so then post a comment here.

2 thoughts on “Appliances and ETL”

  1. Interesting subject. We are currently in the middle of both an appliance evaluation as well as a Kalido evaluation. I know Andy started Kalido, but can it really be used as the sole tool for the enterprise wide DW or does it work best in smaller contained/limited warehousing scenarios?

  2. After working a lot with Ab Initio surely i agree that acquisition of a tool isn’t a prerequisite to solve performance oriented data issues,but it simply lengthens.I had same experience while resolving an issue and i have posted in my blog as well

Comments are closed.