PromQL Operators — Part 2 (Aggregation)


In 
our previous post, we learned how Prometheus matches two instant vectors when an operator sits between them. We built a CPU activity percentage query step by step and previewed sum without() to complete it.

In this post, we will learn aggregation operators in full. Aggregation collapses many series into fewer series by computing a single value across a group. We will use sum(), avg(), min(), max(), and count() — and we will control exactly how they group with the by and without clauses.

Our setup is a Prometheus server scraping a Linux host via Node Exporter. The target has these labels:
  • instance="172.31.33.131:9100"
  • job="node_exporter"
All examples will run in the Prometheus UI at http://<our-server>:9090.


Why We Need Aggregation

Node Exporter reports node_cpu_seconds_total with one series per CPU core per mode. On a four-core host, a query for the user mode returns four series — one per core. On a sixteen-core host, sixteen series.


Most of the time, we do not want that level of detail. We want one number: total CPU time in user mode across all cores. Aggregation is how we collapse those four or sixteen series into one.
Without aggregation we read individual series. With aggregation we read the group.


The by and without Clauses

Every aggregation operator accepts either a by clause or a without clause. These control which labels survive the aggregation.

by(l1, l2) keeps only the listed labels in the result. All other labels are dropped. Series that share the same values for the listed labels are collapsed into one.

without(l1, l2) drops the listed labels and keeps everything else. Series that become identical after removing those labels are collapsed into one.

We will use one or the other in every aggregation we write. Omitting both collapses all series into a single result — which is sometimes what we want, but usually not.

sum() — Add values across series

sum() adds the values of all series in a group. It is the operator we will reach for most often.
Let's start with the problem from Part 4.5. We have node_cpu_seconds_total with one series per core per mode. We want total CPU time in user mode across all cores:
sum by(instance, job) (
node_cpu_seconds_total{instance="172.31.33.131:9100", mode="user"}
)

We'll run this. Instead of one series per core, we get one series total — the sum of user-mode CPU time across all cores. The result carries only the 
instance and job labels we listed in by.


We can write the same thing using without to drop the cpu and mode labels instead:
sum without(cpu, mode) (
node_cpu_seconds_total{instance="172.31.33.131:9100", mode="user"}
)

Both expressions produce the same result. The 
by version is more explicit about what we keep. The without version is more explicit about what we discard. On a metric with many labels, without is often shorter to write.


Now let's build a real dashboard query. We want total available memory across all memory types. Node Exporter exposes several memory metrics — MemAvailable, MemFree, Buffers, Cached. We want the single available figure:
sum by(instance) (
node_memory_MemAvailable_bytes{instance="172.31.33.131:9100"}
)

This collapses to one number per host — exactly what a memory gauge panel on a Grafana dashboard needs.


avg() — Average values across series

avg() computes the arithmetic mean across all series in a group. We will use it when we want a representative value rather than a total.

Let's compute the average CPU idle percentage across all cores. First, we calculate the idle ratio per core


Next we calculate the total cpu seconds total per core, as a sum of all modes:



then we average across cores:

avg by(instance) (
node_cpu_seconds_total{instance="172.31.33.131:9100", mode="idle"}
/
sum without(mode) (node_cpu_seconds_total{instance="172.31.33.131:9100"})
) * 100

We'll run this. We get one number — the average idle percentage across all CPU cores on our host. If core 0 is at 95% idle and core 1 is at 85% idle, this returns 90%.


We can also average load over time. Let's compute the average 1-minute load average across the last hour. We will use avg_over_time() for this — a function we will cover in detail in the functions post:

avg_over_time(
node_load1{instance="172.31.33.131:9100"}[1h]
)

We'll run this. We get the mean value of 
node_load1 sampled over the last hour — a smoothed view of load that filters out brief spikes.


min() and max() — Extremes across series

min() returns the lowest value and max() returns the highest value across all series in a group.
Let's find the filesystem with the least free space on our host — the one most at risk of filling up.

min by(instance) (
node_filesystem_avail_bytes{
instance="172.31.33.131:9100",
fstype!~"tmpfs|squashfs|devtmpfs"
} / (1024 * 1024 * 1024)
)

We'll run this. We get the single smallest free space value across all real filesystems, expressed in GB. This is useful for a summary panel that shows the worst-case disk situation at a glance.


To find which filesystem that is, we drop the by clause and keep all labels — but that returns all series, not just the minimum. We will handle that with topk() and bottomk() later in this post.

Let's find the most loaded CPU core:

max by(instance) (
1 - (
node_cpu_seconds_total{instance="172.31.33.131:9100", mode="idle"}
/
sum without(mode) (node_cpu_seconds_total{instance="172.31.33.131:9100"})
)
) * 100

We'll run this. We get the busiest CPU core's activity percentage. On a MuleSoft host under load, this tells us the worst-case core utilization — a better signal for CPU pressure than the average.


count() — Count series in a group

count() returns the number of series in a group rather than computing on their values. We will use it to count resources, active alerts, or running processes.

Let's count how many real filesystems our host has:
count by(instance) (
node_filesystem_avail_bytes{
instance="172.31.33.131:9100",
fstype!~"tmpfs|squashfs|devtmpfs"
}
)

We'll run this. We get the number of real filesystems Prometheus is monitoring on our host. The value of each metric does not matter — only the number of series.


Let's count how many CPU cores the host has:
count by(instance) (
node_cpu_seconds_total{instance="172.31.33.131:9100", mode="idle"}
)

We'll run this. Each core produces one 
idle series, so the count equals the number of cores. On our host we will see 2 for a dual-core VM.


topk() and bottomk() — Top or bottom N series

topk(n, expr) returns the N series with the highest values. bottomk(n, expr) returns the N series with the lowest values. These are the operators we use when we want to identify the worst offenders rather than just the extreme value.

Let's find the two filesystems with the least free space:

bottomk(2,
node_filesystem_avail_bytes{
instance="172.31.33.131:9100",
fstype!~"tmpfs|squashfs|devtmpfs"
} / (1024 * 1024 * 1024)
)

We'll run this. We get the two filesystems with the lowest free space in GB — their full label sets intact, including 
mountpoint. This is more useful than min() alone because we can see exactly which mounts are at risk.


Let's find the CPU mode consuming the most time across all cores:
topk(3,
sum by(mode) (
node_cpu_seconds_total{instance="172.31.33.131:9100"}
)
)

We'll run this. We get the three CPU modes with the highest total accumulated time — typically 
idle, user, and system in that order on a healthy host. If iowait appears near the top, our host has a disk bottleneck.


Putting It All Together

Let's build a query we will actually use on a MuleSoft host dashboard. We want the active CPU usage percentage — not idle, not iowait — averaged across all cores, expressed as a percentage.

We will build it step by step.

Step 1 — Get active CPU seconds per core, excluding idle and iowait:
node_cpu_seconds_total{
instance="172.31.33.131:9100",
mode!~"idle|iowait"
}


Step 2 — Sum active seconds across modes, keeping the cpu label:
sum by(instance, cpu) (
node_cpu_seconds_total{
instance="172.31.33.131:9100",
mode!~"idle|iowait"
}
)


Step 3 — Sum total seconds across all modes per core:
sum by(instance, cpu) (
node_cpu_seconds_total{instance="172.31.33.131:9100"}
)


Step 4 — Divide active by total to get the ratio per core:
sum by(instance, cpu) (
node_cpu_seconds_total{instance="172.31.33.131:9100", mode!~"idle|iowait"}
)
/
sum by(instance, cpu) (
node_cpu_seconds_total{instance="172.31.33.131:9100"}
)


Step 5 — Average the per-core ratios across all cores and convert to percentage:
avg by(instance) (
sum by(instance, cpu) (
node_cpu_seconds_total{instance="172.31.33.131:9100", mode!~"idle|iowait"}
)
/
sum by(instance, cpu) (
node_cpu_seconds_total{instance="172.31.33.131:9100"}
)
) * 100

We'll run this final query. We get one number — the average active CPU usage across all cores as a percentage. This is the query we will use as the CPU utilization panel on our MuleSoft host dashboard.


Aggregation Operator Quick Reference

OperatorWhat it computes
sum()Total of all values in the group
avg()Arithmetic mean of all values in the group
min()Lowest value in the group
max()Highest value in the group
count()Number of series in the group
topk(n, expr)N series with the highest values
bottomk(n, expr)N series with the lowest values

Clause

What it does
by(l1, l2)Keep only the listed labels — collapse everything else
without(l1, l2)Drop the listed labels — keep everything else


Summary

Aggregation operators collapse many series into fewer series. sum() totals values across a group. avg() averages them. min()and max() find the extremes. count() counts the number of series. topk() and bottomk() return the N highest or lowest series with their labels intact. The by clause controls which labels survive by keeping only the ones we list. The withoutclause controls them by dropping only the ones we list. Every aggregation we write in a production dashboard or alert rule will use one of these two clauses explicitly.

In the next post, we will learn about PromQL functions  rate(), irate(), increase(), histogram_quantile(), and more. These are the tools we use to work with counters, histograms, and time-windowed computations.

Previous Post Next Post