Imagine building a new highway without knowing how many cars it can handle before traffic jams form. At first, with only a few drivers, everything runs smoothly. But as more vehicles enter, the road slows, accidents happen, and the system collapses.
Our APIs face the same challenge. They can only process so many requests before slowing down or failing. An API performance baseline defines how far an API can go before it reaches its limits. Instead of relying on someone else to set requirements or targets, the baseline shows us what the API itself can reliably handle under real conditions.In an ideal world, we would always have clear requirements from the business or from API consumers. We could design performance tests for every scenario—load, stress, spike, and endurance—and cover every edge case. These tests are valuable and we should use them. But in daily practice, those requirements are often missing or incomplete.
That’s why we begin with a baseline. From the very first version of our apps, we measure what the API can support. This gives us a foundation to set fair usage tiers, apply MuleSoft rate-limiting policies, and prepare for growth with confidence.
In this tutorial, we will learn how to establish a performance baseline for a Mule API using Apache JMeter. We will build tests, measure throughput, identify bottlenecks, and document stable operating points for our APIs..
Prerequisites
Before we begin, make sure you have:- A Mule API deployed in a production-sized environment (CloudHub, Standalone or Runtime Fabric)
- Apache JMeter installed. Try not to install it on your own computer, it won’t provide reliable results. It’s better to install it in a dedicated server as close as possible to where the API consumers of your app would be to simulate as much as we can a realistic scenario.
- Anypoint Runtime Manager dashboards (or another monitoring tool) to track CPU, memory, and error rates
- Access to the application logs to analyze when the application starts to degrade the performance and why.
Defining a Performance Baseline
A performance baseline tells us the highest volume of requests an API can handle while still responding within an acceptable time. It marks the point where the system stays steady before errors and delays appear.For example, imagine an API deployed on a single production-like node. If it can answer requests in about 250ms and sustain 300 requests per minute, then its baseline is 300 rpm (5 rps) at 250ms response time.
This means:
- At loads under 300 rpm, responses stay fast and consistent.
- At 300 rpm or above, the API will either slow down beyond 250ms or begin returning errors.
To find the baseline for our apps we need to find the maximum throughput, that is, we need to discover the maximum throughput the API can maintain without breaking down. In practice, this means measuring how many requests per minute the system can serve while keeping response times stable and avoiding errors or timeouts.
For that we’ll do a step load testing (sometimes referred to as ramp-up testing), where we’ll start small our test, then gradually increase concurrent users or request rates until the API reaches its limit. Here’s how to do it step by step:Step 1: Establish the Initial Response Time
We start by finding the Reference Average Response Time (RART).- In JMeter, create a simple test plan with a single thread.
- Run it against the endpoint of your API for 10–15 minutes.
- Record the average response time.
Step 2: Calculate the Interval Between Requests
APIs don’t handle random floods well, so we introduce a rhythm using JMeter’s Gaussian Random Timer. It spreads requests apart with a mix of fixed and random delays. The Gaussian Random Timer has two parameters- Constant Delay Offset (CDO): Represents the average time between consecutive requests per user. For example If a user normally performs an action every 500ms to 1s, set CDO ≈ 500–1000ms.
- Deviation: Introduces variability around CDO. It should reflect the natural differences in user behavior. For example, If most users act within ±200ms of the average, set deviation = 200ms.
- Constant Delay Offset (CDO) = (RART × Number of Threads) – 10%
- Deviation = (RART × Number of Threads × 10%) × 2
- CDO = (114 *10) - (114*10*0.10) = 1,140 - 114 = 1,026 ms
- Deviation = (114*10*0.10)*2 = 114*2 = 228 ms
Step 3: Simulate Higher Loads
Create new Thread Groups in JMeter with 50, 100, 200 or more users. For our app, we’ll create thread groups with an increment of 50 threads per step and we’ll run each step for 5 minutes. We’ll apply the same CDO and Deviation. Do not recalculate them—this way the throughput naturally rises as we add more users.Step 4: Execute and Monitor Resources
Run the test plans. While they run, monitor your API in Runtime Manager. Watch for:- CPU or memory pinned at 100%
- Rising response times
- Increasing error rates
- Downstream systems failing under load
- Thread contention - the number of threads increases until it reaches a limit and remains flat
Step 5: Narrow the Range
Once you see failures, zoom in. For example, if errors appear between 30 and 40 users, test smaller steps (32, 34, 36). This helps you pinpoint the exact maximum throughput.Step 6: Capture the Baseline
Take the last stable run—the point where throughput was maximized, response time was within your acceptable standards and errors were minimal. That is your performance baseline.Step 7: Repeat for Other Operations
Remember: one API operation may handle load differently from another. Test each important method + resource combination. Group similar operations (for example, Commands vs Queries) to create meaningful SLA tiers.Summary
Establishing a performance baseline is not optional. It is the foundation for stability, fairness, and trust in our APIs. By using JMeter with MuleSoft monitoring, we can measure limits, avoid outages, and design policies that scale safely with demand.We should make this a habit: baseline every critical API, document the results, and revisit them as our systems evolve. In doing so, we build resilient platforms that our teams and consumers can rely on.