In our previous post, we installed node_exporter on an external Linux server and configured Prometheus to scrape it. Prometheus now collects data from that server every 15 seconds. But what exactly is Prometheus collecting? Before we write queries or build dashboards, we need to understand how Prometheus structures that data.
This article explains how metrics work in Prometheus, detailing their components, types, and the importance of labels and timestamps.
For example, node_exporter uses this convention throughout. We can see it in the metrics it exposes:
This metric has two labels:
That does not scale. A server with 32 cores and 8 CPU modes would need 256 metric names. With labels, we use one name and filter in queries:
The raw value grows forever. We query counters with
For example, Node Exporter exposes memory metrics as gauges:
We query gauges directly. No rate calculation needed. The value itself tells us the current state:
For example, Node Exporter uses histograms for measurements that vary across a range, such as disk read times:
The
A histogram always creates these three series:
Prometheus stores all samples as time series data. Each data point is a pair:
Here is what the stored samples look like for available memory on our server:
The default scrape interval is 15 seconds. We set this in
Each query uses the metric types we covered. CPU uses a counter with
This article explains how metrics work in Prometheus, detailing their components, types, and the importance of labels and timestamps.
What Is a Prometheus Metric?
A Prometheus metric is a single measurement with context. It has three parts:| Component | Purpose | Example |
|---|---|---|
| Name | Identifies what we measure | node_cpu_seconds_total |
| Labels | Add context to the measurement | {cpu="0", mode="idle"} |
| Value | The measured number | 45231.42 |
Together, a name and a set of labels form a time series. Prometheus stores each unique combination separately. That separation is what makes queries powerful.
Metric Names
A metric name tells us what we are measuring. We follow a naming convention with three parts:<namespace>_<subsystem>_<unit>node_cpu_seconds_total
node_memory_MemAvailable_bytes
node_filesystem_avail_bytes
node_network_receive_bytes_totalPart | Purpose | Example |
|---|---|---|
| Namespace | The system or application | node |
| Subsystem | The component | cpu, memory, filesystem |
| Unit | What we measure, with unit | seconds_total, bytes |
We always include the unit in the name.
A label set is a list of key-value pairs. Here is a real metric from node_exporter:
node_memory_MemAvailable_bytes is clear. node_memory_MemAvailable is not. We never abbreviate units. We write seconds, not sec. We write bytes, not b.Labels
Labels add dimensions to a metric. They let us filter and group measurements without creating separate metric names.A label set is a list of key-value pairs. Here is a real metric from node_exporter:
node_cpu_seconds_total{cpu="0", mode="idle"}cpu and mode. Our server has multiple CPU cores. Each core runs in multiple modes: idle, user, system, iowait. Without labels, we would need a separate metric name for every core and every mode combination. Labels handle that with one metric name.Why Labels Matter
Without labels, we would need metric names like these:node_cpu0_idle_seconds_total
node_cpu0_user_seconds_total
node_cpu1_idle_seconds_total
node_cpu1_user_seconds_total# Idle time on CPU core 0
node_cpu_seconds_total{cpu="0", mode="idle"}
# Total user-space CPU time across all cores
sum(rate(node_cpu_seconds_total{mode="user"}[5m]))Label Cardinality Warning
Each unique label value combination creates a new time series. High-cardinality labels break Prometheus. We never use a label for values that change constantly and have no bounded set.| Safe Label Values | Unsafe Label Values |
|---|---|
cpu="0" | process_id="48291" |
mode="idle" | request_id="req-00291" |
mountpoint="/var" | timestamp="1714000000" |
Safe labels have a small, fixed set of possible values. Unsafe labels create millions of time series and consume all available memory.
We use counters for totals: total CPU seconds consumed, total bytes received, total disk reads completed.
For example, Node Exporter exposes many counters. Here is the CPU seconds counter:
The Four Metric Types
Prometheus defines four metric types. Each type fits a different kind of measurement.1. Counter
A counter measures a value that only goes up. It never decreases. It resets to zero only when the process restarts.We use counters for totals: total CPU seconds consumed, total bytes received, total disk reads completed.
For example, Node Exporter exposes many counters. Here is the CPU seconds counter:
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0", mode="idle"} 45231.42
node_cpu_seconds_total{cpu="0", mode="user"} 3821.17
node_cpu_seconds_total{cpu="0", mode="system"} 812.05
node_cpu_seconds_total{cpu="1", mode="idle"} 44901.88
node_cpu_seconds_total{cpu="1", mode="user"} 4012.33rate() to find how fast the value grows over time:# Percentage of CPU time spent in user space over the last 5 minutes
100 * rate(node_cpu_seconds_total{mode="user"}[5m])Rule: Counter names always end in _total.2. Gauge
A gauge measures a value that can go up or down. It represents a current state. We use gauges for snapshots: available memory, free disk space, current system load, open file descriptors.For example, Node Exporter exposes memory metrics as gauges:
# HELP node_memory_MemAvailable_bytes Memory information field MemAvailable_bytes
# TYPE node_memory_MemAvailable_bytes gauge
node_memory_MemAvailable_bytes 2147483648
# HELP node_memory_MemTotal_bytes Memory information field MemTotal_bytes
# TYPE node_memory_MemTotal_bytes gauge
node_memory_MemTotal_bytes 8589934592# Percentage of memory currently in use
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))3. Histogram
A histogram measures the distribution of values. It counts observations and groups them into buckets.For example, Node Exporter uses histograms for measurements that vary across a range, such as disk read times:
# HELP node_disk_read_time_seconds_total Total time spent reading in seconds
# TYPE node_disk_read_time_seconds_total histogram
node_disk_read_time_seconds_bucket{device="sda", le="0.001"} 9423
node_disk_read_time_seconds_bucket{device="sda", le="0.01"} 14210
node_disk_read_time_seconds_bucket{device="sda", le="0.1"} 14788
node_disk_read_time_seconds_bucket{device="sda", le="+Inf"} 14823
node_disk_read_time_seconds_sum{device="sda"} 87.42
node_disk_read_time_seconds_count{device="sda"} 14823le label stands for "less than or equal." Each bucket counts how many observations fell at or below that threshold.A histogram always creates these three series:
| Series | Meaning |
|---|---|
<name>_bucket{le="<threshold>"} | Count of observations at or below the threshold |
<name>_sum | Sum of all observed values |
<name>_count | Total number of observations |
We calculate percentiles from histograms:
A summary always creates these series:
# 99th percentile disk read time over the last 5 minutes
histogram_quantile(0.99, rate(node_disk_read_time_seconds_bucket{device="sda"}[5m]))4. Summary
A summary also measures distributions. It calculates percentiles on the client side, inside the exporter. Prometheus stores the pre-calculated result.A summary always creates these series:
| Series | Meaning |
|---|---|
<name>{quantile="0.5"} | Median (50th percentile) |
<name>{quantile="0.99"} | 99th percentile |
<name>_sum | Sum of all observed values |
<name>_count | Total number of observations |
Histogram vs. Summary
| Feature | Histogram | Summary |
|---|---|---|
| Buckets defined by | Us, at configuration time | N/A |
| Percentiles calculated by | Prometheus, at query time | The exporter, at collection time |
| Aggregatable across instances | Yes | No |
| Flexible quantiles at query time | Yes | No |
We prefer histograms when we scrape multiple servers. We aggregate histogram data across all hosts in a single query. Summary quantiles are pre-calculated per instance. We cannot combine them after the fact.
Prometheus assigns the timestamp at scrape time. We do not set timestamps manually in a standard scraping workflow.
Timestamps
Every metric sample has a timestamp. The timestamp records when Prometheus collected the value.Prometheus assigns the timestamp at scrape time. We do not set timestamps manually in a standard scraping workflow.
Prometheus stores all samples as time series data. Each data point is a pair:
(timestamp, value)node_memory_MemAvailable_bytes 2147483648 @ 1714000000
node_memory_MemAvailable_bytes 2013265920 @ 1714000015
node_memory_MemAvailable_bytes 1879048192 @ 1714000030prometheus.yml under the scrape_interval directive.Putting It All Together
We now have Node Exporter running on our external server. Prometheus scrapes it every 15 seconds. Here are useful queries we can run against that data today:# Overall CPU idle percentage across all cores, last 5 minutes
100 * avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))
# Memory used as a percentage
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))
# Disk space used on the root filesystem
100 * (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}))
# Network bytes received per second on eth0, last 5 minutes
rate(node_network_receive_bytes_total{device="eth0"}[5m])rate(). Memory and disk use gauges directly. We build dashboards and alerts from exactly these patterns.Summary
We covered the building blocks of Prometheus metrics. Here is the short version:- A metric has three parts: a name, labels, and a value.
- Names follow the pattern
namespace_subsystem_unit. Always include the unit. - Labels add dimensions. We keep label cardinality low. We never label with unbounded values.
- Prometheus defines four metric types: counter, gauge, histogram, and summary.
- Counters track totals that only go up. Always use
rate()to query them. - Gauges track current state. Query them directly.
- Histograms track distributions. We prefer them over summaries because they aggregate across multiple servers.
- Timestamps record when Prometheus collected the data.