ETL - Extract, Transform and Load

ETL is short for Extract, Transform and Load, and is a term describing a complicated procedure to gather data from different data sources, alter the collected data and finally load the data into a data warehouse. The idea behind ETL is to extract data from various sources in various formats, modify the data complying with business requirements, and then put the data in a single location, from where the data can be mined.

The ETL process consists of three main parts - Extract, Transform and Load. The first part is collecting the data from a variety of sources. The most common data source formats are files (plain text files, email files, word files, etc.) and data residing in a RDBMS. After the extraction of data has been accomplished, the data can be transformed in line with the business logic requirements. The transformation step of the ETL process can be very complex and can incorporate one or more of the following data transformations - data sorting, data aggregation, data filtering, data cleaning and more. Once the ETL transformation has been completed, the data has to be loaded into a data warehouse.

ETL tools are software products designed to facilitate extraction, transformation and data loading. There are many ETL tools, both free and commercial from vendors including Oracle, Microsoft, IBM, SAP, and Informatica Corporation.