Introduction
This simple sample introduce the use of Kettle's Regex step to process
a Tomcat's log file. We will parse the log and separated the row based
on a regular expression pattern.
You can download the file samples at the end of the article.
Sample Explanation
- We get the log file's content at [Tomcat Logs] (Text Input) step.
- Validate the pattern using [Validasi Regex] (Regex Evalution) step. Here we only get rows with date information plus message pattern in the same row. Save the boolean checking value in a field named result.
- Based on result field, we make decision where the row should go next using [Filter Rows] (Filter Rows) step. Data that match with condition 'result = Y' will go ahead to [Tanggal] step and a non-matching one will go to [Others] step.
- You can see the results by previewing [Others] and [Tanggal] dummy steps.
- Finally we split the matching incoming content into 3 fields (tanggal, jam and pesan) using javascript in [Hasil Akhir] step (Modified Java Script Value).
- Done
Result
Original Log File and Data Preview on [Hasil Akhir] step executed in Spoon
Drawback
By using Modified Java Script Value, we have the risk of processing bottleneck in this step as scripting can be slower than if we implemented it in a plugin with specific functionality.
Nevertheless, this sample show the flexibility of Kettle if you know how to manipulate any data type using Java language.
Reference
|
 Updating...
localhost.2008-06-29.log (95k) Feris Thia, Aug 30, 2008, 4:44 AM
Feris Thia, Aug 30, 2008, 4:44 AM
|