How to process JSON log files with Logstash


MuleSoft logs are often in plain text or JSON format. Logstash filters (like 
grok or json) can parse these logs to extract meaningful fields (e.g., timestamp, error code, application name).

In this post, we will see how we can create a simple pipeline in Logstash to parse a file containing JSON objects and how we can extract each JSON object and send it as independent message to an Elasticsearch index


Prerequisites

  • An Elasticsearch instance up & running and, optionally, a Kibana instance, connected to Elasticsearch.
  • A Logstash instance installed
  • Make sure the whole ELK (Elasticsearch, Logstash, Kibana) are all in the same version to avoid communication issues
  • In this tutorial, we’ll use two servers. One server with the Elasticsearch and Kibana instances and another server with Logstash. 
  • Both servers are on Ubuntu Server 24.04 LTS. 

Here are some posts that can help you to get your ELK stack:


Generate some logs

First thing, let’s get some logs in JSON format that we can use as sample data for our Logstash pipeline. This is a Mulesoft Blog so, let’s get those logs from a real Mule app so that we can see if this is something that could be useful for our Mule apps.

For that, head over to Anypoint Studio, create a New Mule Project and drag & drop the following elements to our flow:
  • An HTTP listener - A simple GET /hello
  • A Logger processor to show how the app writes to the log. Write any text in the message that can help you identify the log is coming from this app when we’ll see the logs in ELK
  • A Set Payload processor to create a response for our test endpoint. Enter any text that confirms the app is running well


Modify the Log4j

By default, the mule apps don’t write to the log in JSON, they do it plain text using the Log4j PatternLayout. We’ll modify that and set the appender to use the JSONLayout. For that, open the log4j2.xml file located at src/main/resources.

The appender that writes to the log file is the RollingFile appender. Modify the appenders section to look like this:

<Appenders>
<RollingFile name="file"
fileName="${sys:mule.home}${sys:file.separator}logs${sys:file.separator}hello-logstash.log"
filePattern="${sys:mule.home}${sys:file.separator}logs${sys:file.separator}hello-logstash-%i.log">
<JSONLayout compact="true" eventEol="true"
properties="true" stacktraceAsString="true">
<KeyValuePair key="host" value="Anypoint Studio" />
<KeyValuePair key="appName" value="hello-logstash" />
<KeyValuePair key="apiType" value="system" />
<KeyValuePair key="ddsource" value="mule" />
<KeyValuePair key="correlationId" value="$${ctx:correlationId:-}" />
</JSONLayout>
<SizeBasedTriggeringPolicy size="10 MB" />
<DefaultRolloverStrategy max="10" />
</RollingFile>
</Appenders>

Notice that we’ve removed the PatternLayout element and replace it with the JSONLayout. In this Layout, using KeyValuePairs we’re adding some properties for context, like the name of the app, the host generating the logs or the correlationId. Those will enrich our logs.


Save the file and run the application. Once it’s deployed send some requests to the endpoint of our flow to generate some logs.


Collect the Logs

In Anypoint Studio, when we run an application, an (embedded) instance of the mule runtime is started. Same as when we start a mule runtime on prem. As such, this runtime will generate logs and store them in a folder of our system, in the $MULE_HOME/logs folder. To find out where this $MULE_HOME is located we need to look at the logs in the console. Scroll up in the Studio console till the start of the logs. The first two lines of the logs when the Mule runtime starts will tell us where in our system the runtime is located:



Copy the path and navigate in your file system to that folder. You’ll see that we’ve got the same structure of files as in a standalone deployment


Go to the logs folder and get the log file corresponding to your application. The file name should be the same as your app. Open the file and verify the logs in there are in the JSON format we’ve defined. Copy the file (or its content) and paste it in a file of the Logstash server. We’ll name it sample-json.log


Set up the pipeline

The way Logstash processes logs is by defining a pipeline. A Logstash pipeline is a series of processing stages that Logstash uses to ingest, transform, and route data from one or more input sources to one or more output destinations. Each pipeline consists of three main components: input, filter, and output. Together, these stages define how data flows through Logstash.


To define a pipeline in Logstash we need to create a conf file. For our example, let’s create the following logstash.conf file in your Logstash server. 

input {
file {
start_position => "beginning"
path => "/path/to/your/logs/sample-json.log"
sincedb_path => "/dev/null"
}
}

filter {
json {
source => "message"
}
}

output {
elasticsearch {
hosts => ["[YOUR_ELASTICSEARCH_SERVER]:9200"]
user => "[YOUR_LOGSTASH_USER]"
password => "[YOUR_PASSWORD]"
index => "index-sample-mule-logs"
}
stdout {
codec => rubydebug
}

}

Where:

  • The input section sets the path to the log file that contains all the json logs to be parsed.
  • In the filter section we’ve included the syntax for the JSON filter. With this siimple syntax, Logstash we’ll go through the file trying to identify JSON objects. This way, each JSON object will be extracted to a new message that will be passed on to the output
  • As outputs, we’ve included two:
    • Our Elasticsearch instance. In here, we need to provide the connection details of our instance along with the index on which Logstash will be ingesting the messages
    • The rubydebug, so that we can see from the console the outputs. This is just for testing/debugging purposes.


Run Logstash

Next, let’s run Logstash with the following command:

sudo /usr/share/logstash/bin/logstash -f /path/to/your/conf/file

Where the -f flag indicates the folder in which we’ve created the conf file with our pipeline.

After you run the command you will see that Logstash starts processing the file and shows you the different messages it is creating (that’s the rubydebug output). 


See the Logs in Kibana

Go to Stack Management > Data > Index Management. You will see that the index we defined in our pipeline has been created and that it contains a few documents. That means our filter has worked and that we’ve managed to extract each individual json object and inject it into the index


Lastly, let’s try to see the logs. Go to Analytics > Discover. Create a Data view for our new index:


Now, as you can see we’ve got a document per each log object. Click on one of them to see the content

Previous Post Next Post