This simple sample introduce the use of Kettle's Regex step to process
a Tomcat's log file. We will parse the log and separated the row based
on a regular expression pattern.
You can download the file samples at the end of the article.
- We get the log file's content at [Tomcat Logs] (Text Input) step.
- Validate the pattern using [Validasi Regex] (Regex Evalution) step. Here we only get rows with date information plus message pattern in the same row. Save the boolean checking value in a field named result.
- Based on result field, we make decision where the row should go next using [Filter Rows] (Filter Rows) step. Data that match with condition 'result = Y' will go ahead to [Tanggal] step and a non-matching one will go to [Others] step.
- You can see the results by previewing [Others] and [Tanggal] dummy steps.
Original Log File and Data Preview on [Hasil Akhir] step executed in Spoon
By using Modified Java Script Value
, we have the risk of processing bottleneck in this step as scripting can be slower than if we implemented it in a plugin with specific functionality.
Nevertheless, this sample show the flexibility of Kettle if you know how to manipulate any data type using Java language.