Logstash Grok Filters for Mule Logs in PatternLayout - Part I

As a best practice we always recommend to use JSON for the format of our Mule logs, especially if we are planning to ship them to external systems like Splunk, ELK, Datadog. The main reason is that it would be much easier for these systems to parse the logs and, therefore, to better use the info in our logs for further analysis with queries or filters.

The format of our logs is defined by the Log4j Layout that we define in our log4j2.xml file. JSONLayout or JSONTemplateLayout are the ones that allow us to log in JSON format
However, in some cases, we can’t use these layouts and we need to work with the logs in the PatternLayout.

In this series of posts, we will see how we can use Grok Filters in Logstash to parse and process our PatternLayout-ed logs to transform them into a more useful JSON logs. In Part I, we’ll see how to create a Grok filter for the default PatternLayout of the Mule apps. In part II, we’ll see how to create our own custom format and filter.

What is GROK? How does it work?

The grok filter in Logstash is a powerful tool for parsing and extracting specific fields from unstructured log data. It uses predefined patterns (or custom ones) to match text in our logs and transform them into structured fields for further processing or analysis.

It uses regular expressions and predefined patterns to identify and extract data. Grok, then, assigns matched text to specific fields, creating structured output.

The basic syntax of GROK in a Logstash pipeline is the following:

grok {
match => { "field_name" => "pattern" }
}

Where:

field_name - The name of the field to apply the grok filter on (e.g., message).
pattern - The grok pattern to match (e.g., %{TIMESTAMP_ISO8601:timestamp}).

The grok patterns syntax is simple. Each field is started from “%” symbol and enclosed in braces. Fields could be assigned one by one or be splitted by a special delimiter. Delimiter could be a whitespace, semicolon, colon, whatever. Each field has 2 parts splitted by colon. 

The first part is a field type. Logstash comes with a library of predefined patterns. These patterns simplify matching common log formats. Here are a few examples of the patterns that would be useful for out Mule logs:
  • %{TIMESTAMP_ISO8601} - Matches ISO8601 timestamps (the format used by default in Mule logs). For example, 2024-11-16 12:34:56,789
  • %{LOGLEVEL} - Matches log levels. For example, INFO, ERROR, DEBUG
  • %{GREEDYDATA} - Matches everything until the end of the line. Useful when there’s no other clear pattern. 
  • %{DATA} - Matches any non-greedy string. For example, my_application
  • %{JAVACLASS} - Matches Java class names. For example, org.mule.runtime.core.internal.processor.LoggerMessageProcessor

For a full list of predefined patterns, you can refer to the Logstash 
grok pattern library.
The second part is the field name. In JSON key-value format it’s going to be a key.


The Default PatternLayout of the Mule Logs

When we create a new Mule Project, Anypoint Studio will provide us with a minimum configuration in the log4j2.xml. This configuration will include the PatternLayout for the default appender of the app (RollingFile).

The Pattern defines a format for our log messages using a conversion pattern string. This string can include various placeholders (conversion specifiers) that represent different parts of the log event, such as the date, log level, thread name, and message. Check out our post Breaking down the Log4j PatternLayout if you want to understand how the conversion patterns and modifiers work.

This is the layout that we’d see in our mule app logs if we don’t change anything in the log4j.


Pattern:

 %-5p %d [%t] [processor: %X{processorPath}; event: %X{correlationId}] %c: %m%n

Where:

  • %-5p: Log level, padded to 5 characters.
  • %d: Date and time of the log event.
  • [%t]: Name of the thread
  • %X{processorPath}
  • %X{correlationId} - correlation Id of the mule message
  • %c: Logger name, that is, the class name that generated the log
  • %m - The log message
  • %n - New line character

Here’s an example of a log event formatted with this layout:

INFO 2024-09-06 18:30:07,203 [[MuleRuntime].uber.08: [logging-tests].myFlow.CPU_LITE @350ae338] [processor: myFlow/processors/0; event: a8cfaa50-6c75-11ef-be6f-38f9d3cde3d6] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: Hello Grok!


Grok filter for the default PatternLayout

Now that we know what the Grok filter is and what’s the default PatternLayout in Mule, let’s see how we can define our Grok filter to transform the default logs into JSON structured logs.

Here’s the Grok pattern we could use for each conversion specifier of the patternlayout:

PatternLayout Conversion SpecifierGrok Pattern
%-5p%{LOGLEVEL:log_level}
%d%{TIMESTAMP_ISO8601:time_stamp}
[%t]\[%{DATA:thread_name}\]
[processor: %X{processorPath};\[%{WORD}: %{DATA:processor_name}\;
event: %X{correlationId}]%{WORD}: %{DATA:correlation_id}\]
%c:%{JAVACLASS:logger_name}:
%m%{GREEDYDATA:log_message}

Notice we added the escaped characters [] and ; by preceding them with the backslash \

Putting all together:

%{LOGLEVEL:log_level}%{SPACE}%{TIMESTAMP_ISO8601:time_stamp}%{SPACE}\[%{DATA:thread_name}\]%{SPACE}\[%{WORD}:%{SPACE}%{DATA:processor_name}\;%{SPACE}%{WORD}:%{SPACE}%{DATA:correlation_id}\]%{SPACE}%{JAVACLASS:logger_name}:%{SPACE}%{GREEDYDATA:log_message}

We’ve added the grok pattern 
%{SPACE} between the conversion specifiers separated by a blank space. This is a safer option, as it will not take into account the number of blank spaces. 

With that, the outcome of the whole filter for our previous example would be like this:

{
"time_stamp": "2024-12-02 09:33:31,087",
"thread_name": "[MuleRuntime].uber.06: [hello-grok].hello-grok.CPU_LITE @1b36296",
"log_level": "INFO",
"processor_name": "hello-grok/processors/0",
"correlation_id": "7e25f250-b090-11ef-9ec4-bcd074eda2f9",
"log_message": "Hello Grok!",
"logger_name": "org.mule.runtime.core.internal.processor.LoggerMessageProcessor"
}

For a quick test we can use the Grok Debugger included in the Dev Tools of Kibana:

Previous Post Next Post