Step Load Testing to Define API Performance Baselines for your Mule Apps

Imagine building a new highway without knowing how many cars it can handle before traffic jams form. At first, with only a few drivers, everything runs smoothly. But as more vehicles enter, the road slows, accidents happen, and the system collapses.

Our APIs face the same challenge. They can only process so many requests before slowing down or failing. An API performance baseline defines how far an API can go before it reaches its limits. Instead of relying on someone else to set requirements or targets, the baseline shows us what the API itself can reliably handle under real conditions.

In an ideal world, we would always have clear requirements from the business or from API consumers. We could design performance tests for every scenario—load, stress, spike, and endurance—and cover every edge case. These tests are valuable and we should use them. But in daily practice, those requirements are often missing or incomplete.
That’s why we begin with a baseline. From the very first version of our apps, we measure what the API can support. This gives us a foundation to set fair usage tiers, apply MuleSoft rate-limiting policies, and prepare for growth with confidence.

In this tutorial, we will learn how to establish a performance baseline for a Mule API using Apache JMeter. We will build tests, measure throughput, identify bottlenecks, and document stable operating points for our APIs..

Prerequisites

Before we begin, make sure you have:

A Mule API deployed in a production-sized environment (CloudHub, Standalone or Runtime Fabric)
Apache JMeter installed. Try not to install it on your own computer, it won’t provide reliable results. It’s better to install it in a dedicated server as close as possible to where the API consumers of your app would be to simulate as much as we can a realistic scenario.
Anypoint Runtime Manager dashboards (or another monitoring tool) to track CPU, memory, and error rates
Access to the application logs to analyze when the application starts to degrade the performance and why.

Defining a Performance Baseline

A performance baseline tells us the highest volume of requests an API can handle while still responding within an acceptable time. It marks the point where the system stays steady before errors and delays appear.
For example, imagine an API deployed on a single production-like node. If it can answer requests in about 250ms and sustain 300 requests per minute, then its baseline is 300 rpm (5 rps) at 250ms response time.
This means:

At loads under 300 rpm, responses stay fast and consistent.
At 300 rpm or above, the API will either slow down beyond 250ms or begin returning errors.

To find the baseline for our apps we need to find the maximum throughput, that is, we need to discover the maximum throughput the API can maintain without breaking down. In practice, this means measuring how many requests per minute the system can serve while keeping response times stable and avoiding errors or timeouts.

For that we’ll do a step load testing (sometimes referred to as ramp-up testing), where we’ll start small our test, then gradually increase concurrent users or request rates until the API reaches its limit. Here’s how to do it step by step:

Step 1: Establish the Initial Response Time

We start by finding the Reference Average Response Time (RART).

In JMeter, create a simple test plan with a single thread.
Run it against the endpoint of your API for 10–15 minutes.
Record the average response time.

This value, 114 ms, becomes the anchor for all following calculations.

Step 2: Calculate the Interval Between Requests

APIs don’t handle random floods well, so we introduce a rhythm using JMeter’s Gaussian Random Timer. It spreads requests apart with a mix of fixed and random delays. The Gaussian Random Timer has two parameters

Constant Delay Offset (CDO): Represents the average time between consecutive requests per user. For example If a user normally performs an action every 500ms to 1s, set CDO ≈ 500–1000ms.
Deviation: Introduces variability around CDO. It should reflect the natural differences in user behavior. For example, If most users act within ±200ms of the average, set deviation = 200ms.

In our case, we’ll use the RART value calculated in the previous step to calculate the delay between requests and control the throughput of the API. We’ll create a second Thread Group of 10 Threads, to keep the same throughput we’ll calculate the Gaussian Random Timer parameters this way:

Constant Delay Offset (CDO) = (RART × Number of Threads) – 10%
Deviation = (RART × Number of Threads × 10%) × 2

In our example: RART = 114 ms and we test with 10 threads:

CDO = (114 *10) - (114*10*0.10) = 1,140 - 114 = 1,026 ms
Deviation = (114*10*0.10)*2 = 114*2 = 228 ms

This keeps throughput stable while letting us increase user load.

Step 3: Simulate Higher Loads

Create new Thread Groups in JMeter with 50, 100, 200 or more users. For our app, we’ll create thread groups with an increment of 50 threads per step and we’ll run each step for 5 minutes. We’ll apply the same CDO and Deviation. Do not recalculate them—this way the throughput naturally rises as we add more users.

Step 4: Execute and Monitor Resources

Run the test plans. While they run, monitor your API in Runtime Manager. Watch for:

CPU or memory pinned at 100%
Rising response times
Increasing error rates
Downstream systems failing under load
Thread contention - the number of threads increases until it reaches a limit and remains flat

These are early signs of hitting the ceiling.

Step 5: Narrow the Range

Once you see failures, zoom in. For example, if errors appear between 30 and 40 users, test smaller steps (32, 34, 36). This helps you pinpoint the exact maximum throughput.

Step 6: Capture the Baseline

Take the last stable run—the point where throughput was maximized, response time was within your acceptable standards and errors were minimal. That is your performance baseline.

Step 7: Repeat for Other Operations

Remember: one API operation may handle load differently from another. Test each important method + resource combination. Group similar operations (for example, Commands vs Queries) to create meaningful SLA tiers.

Summary

Establishing a performance baseline is not optional. It is the foundation for stability, fairness, and trust in our APIs. By using JMeter with MuleSoft monitoring, we can measure limits, avoid outages, and design policies that scale safely with demand.
We should make this a habit: baseline every critical API, document the results, and revisit them as our systems evolve. In doing so, we build resilient platforms that our teams and consumers can rely on.