4 Types of Performance Testing that you should Always do for your Mule Apps

Performance testing measures how well our Mule application performs under specific conditions. It’s a way to ensure our app can handle real-world demands, from a handful of users to a bustling swarm of requests. This testing helps find bottlenecks, predict capacity, and confirm stability.

We, as MuleSoft Architects and Developers, should do performance testing to ensure our Mule applications meet functional and non-functional requirements under varying load conditions.

However, performance testing is often thought of as a single activity, but it’s actually a collection of different techniques designed to evaluate specific aspects of an application’s behavior. As a matter of fact there’s not a unique performance test but a few different types of performance tests that we should do for our Mule apps. The type of performance testing we choose depends on what we want to learn about our app and the conditions it will face in the real world.

Is our app prepared for a sudden traffic surge? Can it run for hours without slowing down? Will it scale as demand grows? Each of these questions points to a different kind of performance test. Understanding these distinctions allows us to tailor our testing approach and uncover valuable insights that a generic performance test might overlook.

In this post, we will dive into the main 4 types of performance testing that we should perform to our Mule applications and understand when to apply each type of testing so that we can make our Mule apps robust, efficient, and ready for any challenge. There are more types of performance testing, but we’ll focus on the following 4, as they are considered, in my opinion, the minimum tests we should always do for our mule apps.

1. Load Testing

The goal of a Load Testing is to evaluate how our application performs under expected, steady-state load conditions. It simulates expected traffic to see how the app handles typical loads.
With a Load Test we can:

Validate application SLAs, such as throughput, response time, and error rates, under typical user loads.
Identify bottlenecks in normal operational scenarios.

Example:

Simulate 100 concurrent users or 1000 requests/minute for a specific duration

Before we do a Load Testing it is important that we’ve got some requirements - we need to have some numbers in terms of what is the (normal) expected throughput or response time for our app so that we can simulate this traffic. With no requirements we can’t do a proper load testing. Well, actually we could do it assuming/guessing our own metrics as Mule developers/architects but how do we really know the throughput/response time we use for our tests are acceptable?

How do we perform a Load Test?

The best way to do Load Testing is to use tools like JMeter/K6/Blazemeter to generate traffic and monitor the response of our app. With these tools we can simulate the expected throughput and monitor, among other values, the response time and error rate of our app. If those metrics are within our requirements that means our app has passed the Load test.

2. Stress Testing

The purpose of Stress Testing is to push our application beyond its limits to see how it reacts under extreme pressure. This type of test is useful to identify breaking points and also to verify the app’s behavior under resource exhaustion or failures and how gracefully our apps handles failures.

So, with a proper Stress test for our app, that means we are after two things:

Set the application's maximum capacity (throughput and concurrent requests).
Verify the application’s behaviour under extreme pressure. This means, checking that the reliability mechanisms we put in place for our app are working - reconnection strategies, circuit breakers, Dead Letter queues. And also our error handling strategy - we force the apps to fail and then we check if the apps is doing what it’s supposed to do when it fails.

How do we do a Stress Test?

Using a performance test tool like the ones mentioned, we can gradually increase load until the application fails or its response time and error rates exceed acceptable limits. So, normally this type of Testing requires multiple iterations to run the same tests under different loads.

3. Spike Testing

In a Spike test, we are trying to understand the application’s behaviour when sudden/dramatic increases in load happen. This is slightly different from the Stress test. In here, we’re not really pushing the app to the limit, we’re testing the response of the app when, in a very short period of time, the traffic increases very quickly (a spike).
This test helps us to simulate traffic patterns that happen in some common scenarios - for example when we know that most of our consumers will try to access our API at the same time as a result of a previous notification or a peak hour.

The goal here is to test how the application handles sudden, large increases in traffic and to verify recovery behavior after the spike subsides.

As a best practice, try to understand as much as you can the traffic patterns of your app - is your app likely to have spikes or moments during the day in which the load can increase dramatically. And if so, how much would this increase would be? Answering these questions would make our tests more real.
For best results, when simulating the peak we should aim for a real peak. Don’t just simulate a peak of unrealistic conditions, that won’t help us to understand if our apps can handle the real pattern of traffic.

Example

Simulate a traffic surge from 100 to 1000 requests/second within a short time.

How do we perform a Spike test?

First, define the conditions of the Spike we need to generate. For example:

Baseline Load: 10 requests per second
Spike Load: 500 requests per second
Spike Duration: 30 seconds
Recovery Period: 60 seconds

With that, we’ll use our preferred performance test tool to simulate that and explore the different options available to simulate a spike. For example, in JMeter, we a spike can be simulated using a Stepping Thread Group that will allow us to define the ramp-up and ramp-down conditions of our test as well as the duration of the spike.

4. Endurance (Soak) Testing

In an Endurance Testing, the purpose is to understand the application behaviour over an extended period under sustained load. It’s similar to a Load Test, in the sense that the test is performed under typical load.

The difference with an standard Load Testing is that in here we will try to detect issues in the app that may emerge only during prolonged operation like memory leaks, resource depletion, garbage collector issues or any sort of performance degradation.

This test is fundamental to validate application stability and reliability over time. In particular, for Mule applications, this type of test is very important, specially for those deployment models in which the runtime plane provide any kind of bursting.

For example, in CH1.0, the bursting was controlled by the AWS credits system and allows an application to use extra resources (CPU) for as long as it has credits. However, when the credits are consumed the CPU assigned to the app can drop dramatically and affect the app performance. This is why it’s very important, with this type of test, to verify that the normal operation of the app over a extended period of time does not consume all the AWS credits or it is not impacted when this happens.

Example:

Run 500 requests/minute for 24 hours to check for resource exhaustion or performance degradation.

How do we perform an Endurance Test?

Monitor system performance while running sustained tests for hours or days.

Summary

In conclusion, these are the 4 fundamental performance tests and when to do them:

Load Testing (steady, expected load) - Verify that APIs handle expected load without performance degradation.
Stress Testing (beyond expected load) - Determine the breaking point by increasing the load until failure.
Spike Testing (sudden bursts of traffic) - Assess API behavior under sudden bursts of traffic.
Endurance (Soak) Testing (long duration load) - Identify long-term performance issues like memory leaks.

Other types of Performance Testing

These 4 types of performance testing are not the only ones. Actually, there are many more that we can do, depending on the different scenarios and conditions we want to test for our apps. Here’s a quick list:

Scalability Testing - Assess how the application scales when additional resources (workers/replicas, cores, memory) are added.
Failover and Resilience Testing - Test the application's behavior during failures and recovery scenarios.
Backend Dependency Testing - Assess the performance impact of downstream systems (databases, external APIs) on the Mule application.
Data Volume Testing - Test the application's ability to handle large payloads or high volumes of data.
Latency Testing - Measure the impact of network latency on application performance.
Security Testing - Test the impact of security measures on application performance.

4 Types of Performance Testing that you should Always do for your Mule Apps

1. Load Testing

Example:

How do we perform a Load Test?

2. Stress Testing

How do we do a Stress Test?

3. Spike Testing

Example

How do we perform a Spike test?

4. Endurance (Soak) Testing

Example:

How do we perform an Endurance Test?

Summary

Other types of Performance Testing

How to plan a (successful) VPN migration - Part I

Contact Form