Kettle

Introduction

Kettle is a very popular ETL - Extract, Transformation and Load - tool which is open sourced and considered one of the best ETL in BI marketplace. 

Designed with a very careful, good architecture and with a great number of popular database support, Kettle is a favorite choice for our data warehousing projects.


Kettle itself is part of Pentaho BI applications suite. It is an independent open source ETL project initiated by Matt Casters until acquired by Pentaho in 2006. 

Ever since, Kettle is also known as Pentaho Data Integration (PDI). Matt himself continues to lead Kettle project development in Pentaho.

Kettle comprises of 4 applications :
  • Spoon, graphical designer for designing job and transformation schemes. It is based on swing.
  • Pan, script that is used to execute transformation scheme in .ktr xml file form or from a repository.
  • Kitchen, script that is used to execute job scheme in .kjb xml file form or from a repository.
  • Carte, atemporary web server which is used to execute job/transformation in cluster / parallel.
All the applications run from a particular batch / shell script.

Table of Contents

  1. Windows Installation
  2. Linux Fedora 9 Installation
  3. Getting Started with Spoon
  4. Kettle Tips
    1. Processing Tomcat Log with Regex
    2. Time Difference Calculation Based on Text Columns
  5. Screencast