Master the Prometheus Configuration File

After installing the Prometheus server and setting up one or more nodes with Node Exporter, the next step is to configure Prometheus to actively pull metrics from these targets. Prometheus utilizes a pull-based model, meaning you must explicitly define the targets to scrape in the configuration.

All configurations are made in the prometheus.yaml file, typically located in the /etc/prometheus folder.

In this article, we’ll learn how to configure Prometheus to scrape metrics from multiple nodes effectively.


What Is Prometheus?

Prometheus works on a pull model. Instead of services pushing data to a central server, Prometheus reaches out and scrapes metrics from each target on a schedule.

Each target exposes a /metrics HTTP endpoint (although it can be changed, as we’ll see later). Prometheus calls that endpoint, reads the data, and stores it. We call one scrape cycle a scrape.


The Configuration File: prometheus.yml

The Prometheus configuration file is a YAML file. By default, Prometheus looks for it at:

/etc/prometheus/prometheus.yml

We pass a custom path at startup with:

prometheus --config.file=/path/to/prometheus.yml

The file has six top-level sections:

SectionPurpose
globalDefault settings applied across the entire config
alertingConnection to Alertmanager for alert routing
rule_filesPaths to recording and alerting rule files
scrape_configsDefines what Prometheus scrapes and how
remote_writeSends metrics to an external storage backend
remote_readReads metrics from an external storage backend
We’ll cover every section in full detail below.


Section 1: global

The global section sets default values for the entire configuration. Any scrape_configs block can override these values locally.

global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
environment: production
region: us-east-1
team: integration
query_log_file: /var/log/prometheus/query.log

scrape_interval

scrape_interval: 15s

This sets how often Prometheus scrapes each target. Prometheus sends an HTTP GET to the 
/metrics endpoint every 15 seconds.

Valid time units: ms, s, m, h, d, w, y

Common values:

ValueUse Case
5sHigh-frequency monitoring. Higher CPU and storage cost.
15sStandard default. Good balance for most services.
30sLower-cost option for stable or low-priority services.
60sCoarse monitoring for infrastructure-level metrics.

For a MuleSoft Mule Runtime running Standalone, 15s gives us enough resolution to detect thread pool saturation, heap spikes, and flow error rates without overwhelming storage.

scrape_timeout

scrape_timeout: 10s
This sets how long Prometheus waits for a target to respond before marking the scrape as failed. The timeout must always be less than or equal to scrape_interval. If the target does not respond within 10 seconds, Prometheus records a scrape failure.

A failed scrape appears in the up metric with a value of 0. We can alert on up == 0 to detect dead targets.


evaluation_interval

evaluation_interval: 15s

This sets how often Prometheus evaluates our alerting and recording rules. Every 15 seconds, Prometheus runs every rule in our 
rule_files and checks whether any alert conditions are true.
Keep evaluation_interval equal to or less than scrape_interval. Evaluating rules more frequently than we collect data produces no benefit.


external_labels

external_labels:
environment: production
region: us-east-1
team: integration

External labels attach to 
every time series and alert that leaves this Prometheus instance. They add context when we ship data to remote storage or when Alertmanager receives alerts.
We use external labels to identify which Prometheus instance produced the data. This is critical in multi-region or multi-environment setups.
For MuleSoft teams, useful external labels might include:
external_labels:
environment: production
mule_runtime_version: "4.6"
business_unit: payments
datacenter: aws-us-east-1


query_log_file

query_log_file: /var/log/prometheus/query.log

This tells Prometheus to log every PromQL query to a file. Each entry records the query text, duration, and timestamps.

We use this to debug slow queries and audit who is querying what. This option is optional. We usually omit it in development.


Section 2: alerting

The alerting section connects Prometheus to one or more Alertmanager instances. Alertmanager handles deduplication, grouping, silencing, and routing of alerts to notification channels like PagerDuty, Slack, or email.
alerting:
alert_relabel_configs:
- source_labels: [environment]
target_label: env
replacement: "$1"
alertmanagers:
- scheme: http
timeout: 10s
api_version: v2
path_prefix: /
static_configs:
- targets:
- alertmanager-01:9093
- alertmanager-02:9093
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
insecure_skip_verify: false
basic_auth:
username: prometheus
password: secret
authorization:
type: Bearer
credentials: my-token


alert_relabel_configs

alert_relabel_configs:
- source_labels: [environment]
target_label: env
replacement: "$1"

This rewrites labels on alerts before Prometheus sends them to Alertmanager. We use it to normalize label names, drop sensitive labels, or rename inconsistent labels from different targets.

This uses the same relabeling syntax as metric_relabel_configs, which we cover in depth in the scrape_configs section.


alertmanagers

This block defines the Alertmanager instances Prometheus sends alerts to.


scheme

scheme: http
The protocol used to connect to Alertmanager. Use http or https. Always use https in production.


timeout

timeout: 10s
How long Prometheus waits for Alertmanager to accept an alert before giving up.


api_version

api_version: v2

The Alertmanager API version. Use 
v2 for all modern Alertmanager deployments (version 0.16+). The v1 API is deprecated.


path_prefix

path_prefix: /
A URL path prefix prepended to all Alertmanager API paths. Use this when Alertmanager sits behind a reverse proxy that adds a base path, such as /alertmanager/.


static_configs

static_configs:
- targets:
- alertmanager-01:9093
- alertmanager-02:9093
Lists the Alertmanager instances by hostname and port. We list multiple instances for high availability. Prometheus sends alerts to all of them.


tls_config

tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
insecure_skip_verify: false
Configures TLS for the connection to Alertmanager.

DirectivePurpose
ca_filePath to the CA certificate that signed the Alertmanager server cert
cert_filePath to the client certificate for mutual TLS
key_filePath to the private key for the client certificate
insecure_skip_verifySet to true to skip certificate validation. Never use in production.


basic_auth and authorization

basic_auth:
username: prometheus
password: secret
authorization:
type: Bearer
credentials: my-token
Use one of these to authenticate to Alertmanager. Use basic_auth for username/password. Use authorization for bearer token authentication. Do not use both at the same time.

For production, store credentials in a file and reference it:
basic_auth:
username: prometheus
password_file: /etc/prometheus/alertmanager-password.txt


Section 3: rule_files

The rule_files section lists paths to files that contain recording rules and alerting rules.
rule_files:
- /etc/prometheus/rules/alerts.yml
- /etc/prometheus/rules/recording_rules.yml
- /etc/prometheus/rules/mulesoft_*.yml

Prometheus accepts glob patterns. Every file matching the pattern loads at startup and reloads when we send 
SIGHUP or call the /-/reload endpoint.

Recording rules pre-compute expensive PromQL expressions and store the result as a new metric. We use them to reduce query time in dashboards.
Alerting rules define conditions that trigger alerts. When the condition is true for a defined duration, Prometheus fires the alert to Alertmanager.

A rule file looks like this:
groups:
- name: mulesoft_alerts
interval: 15s
rules:
- alert: MuleRuntimeDown
expr: up{job="mule_runtime"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Mule Runtime is down"
description: "Instance {{ $labels.instance }} has been down for more than 1 minute."

We’ll discuss rule files in depth in a future post. For now, we focus on referencing them correctly in 
prometheus.yml.


Section 4: scrape_configs

The scrape_configs section is the heart of Prometheus. It defines every target Prometheus monitors.
Each entry in the list is a scrape job. A job groups related targets together. Prometheus assigns the job label to every metric it collects from that group.
scrape_configs:
- job_name: mule_runtime
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
honor_labels: false
honor_timestamps: true
params:
format: [prometheus]
basic_auth:
username: monitor
password_file: /etc/prometheus/mule-password.txt
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
insecure_skip_verify: false
static_configs:
- targets:
- mule-server-01:8081
- mule-server-02:8081
labels:
environment: production
datacenter: us-east
relabel_configs:
- source_labels: [__address__]
target_label: instance
metric_relabel_configs:
- source_labels: [__name__]
regex: jvm_.*
action: keep
We now break down every directive in a scrape job.


job_name

job_name: mule_runtime
A unique name for this scrape job. Prometheus attaches a job label with this value to every metric from this job. We use it to filter metrics in PromQL:
up{job="mule_runtime"}
Use descriptive, lowercase names with underscores. Good examples: mule_runtime, mule_healthcheck, postgres, nginx.


scrape_interval and scrape_timeout (per-job)

scrape_interval: 15s
scrape_timeout: 10s
These override the global values for this job only. We increase the interval for low-priority jobs. We decrease it for high-priority jobs where we need faster detection.


metrics_path

metrics_path: /metrics
The HTTP path Prometheus appends to the target address to collect metrics. The default is /metrics. Change it if our service exposes metrics on a different path.


scheme

scheme: http
The protocol used for scraping. Use http or https. Use https in production when the target supports it.


honor_labels

honor_labels: false
This controls what happens when a scraped metric already contains a label that Prometheus would add (like job or instance).

ValueBehavior
false (default)Prometheus overwrites labels from the target with its own labels. Safer.
TRUEPrometheus keeps the labels from the target as-is. Use with federation or push gateways.

We always leave this as false for direct scraping of services.


honor_timestamps

honor_timestamps: true
When true, Prometheus respects timestamps embedded in the scraped metrics. When false, Prometheus ignores target timestamps and uses its own scrape time.
Use true only when our metrics source provides precise timestamps (like the Pushgateway). For most live services, use false or omit the directive.


params

params:
format: [prometheus]
Appends URL query parameters to every scrape request. We use this when a target requires query parameters to return Prometheus-formatted output.
For example, some JMX exporters require:
params:
target: [jmx-service:9999]
module: [jmx_mule]


basic_auth

basic_auth:
username: monitor
password_file: /etc/prometheus/mule-password.txt
Sets HTTP Basic Authentication credentials for scraping a protected endpoint. Always use password_file instead of passwordin production. This prevents credentials from appearing in the config file.


authorization

authorization:
type: Bearer
credentials_file: /etc/prometheus/token.txt
Sets an HTTP Authorization header on every scrape request. Use this for token-based auth. The header value becomes Bearer <token>.


tls_config (per-job)

tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
server_name: mule-server-01.internal
insecure_skip_verify: false
DirectivePurpose
ca_fileCA certificate to verify the server's TLS certificate
cert_fileClient certificate for mutual TLS
key_filePrivate key for the client certificate
server_nameOverride the server name used for TLS verification. Useful when the hostname in the cert differs from the target address.
insecure_skip_verifySkip TLS verification. Never use in production.


static_configs

static_configs:
- targets:
- mule-server-01:8081
- mule-server-02:8081
labels:
environment: production
datacenter: us-east
Lists static scrape targets. Each target is a host:port string. Prometheus scrapes the metrics_path on each target.
We attach custom labels here. These labels appear on every metric scraped from these targets. They are useful for grouping targets by environment, datacenter, or team.


Service Discovery Alternatives to static_configs

static_configs works for small, stable environments. For dynamic infrastructure, Prometheus supports many service discovery mechanisms that automatically find targets.

Discovery TypeUse Case
file_sd_configsRead targets from a JSON or YAML file. We update the file; Prometheus auto-reloads.
consul_sd_configsDiscover services registered in HashiCorp Consul.
ec2_sd_configsDiscover AWS EC2 instances automatically.
kubernetes_sd_configsDiscover Kubernetes pods, nodes, and services.
dns_sd_configsDiscover targets through DNS SRV or A record lookups.
http_sd_configsFetch a target list from an HTTP endpoint.
For example, for a MuleSoft deployment on AWS, ec2_sd_configs automatically finds all Mule Runtime EC2 instances tagged with a specific key:
- job_name: mule_runtime
ec2_sd_configs:
- region: us-east-1
port: 8081
filters:
- name: tag:Role
values:
- mule-runtime


relabel_configs

Relabeling transforms labels on scraped targets before the scrape happens. It runs before we connect to the target.
We use relabel_configs to:
  • Set the instance label to a human-readable value
  • Filter which targets to scrape
  • Extract information from labels and store it in a new label
relabel_configs:
- source_labels: [__address__]
regex: "(.+):\\d+"
target_label: host
replacement: "$1"
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
- source_labels: [__meta_ec2_tag_Environment]
regex: production
action: keep

Relabeling uses five core directives:

DirectivePurpose
source_labelsOne or more label names to read. Multiple values join with a separator.
separatorCharacter that joins multiple source_labels values. Default: ;
target_labelThe label to write the result into.
regexA regular expression applied to the joined source value. Default: (.*)
replacementThe value written to target_label. Use $1 to reference capture groups. Default: $1
actionWhat to do with the match. Default: replace


action Values

ActionBehavior
replaceReplace target_label with replacement. Default.
keepKeep the target only if regex matches. Drop all others.
dropDrop the target if regex matches. Keep all others.
labelkeepKeep only labels whose names match regex. Drop all others.
labeldropDrop labels whose names match regex. Keep all others.
labelmapCopy labels whose names match regex to new labels. Use replacement to define the new name.
hashmodHash the source label and assign a modulo value. Used for horizontal sharding.


Special Labels in Relabeling

Prometheus exposes internal metadata as labels prefixed with __. We use these in relabel_configs:

LabelValue
__address__The target address (host:port)
__scheme__The scrape scheme (http or https)
__metrics_path__The metrics path
__param_<name>URL parameters for the scrape request
__meta_*Metadata labels from service discovery (e.g., __meta_ec2_tag_Name)

All __ labels are stripped before metrics are stored. Only labels we explicitly copy to non-__ labels survive.

metric_relabel_configs

metric_relabel_configs runs after the scrape. It transforms or filters individual metric series before Prometheus stores them.

We use it to:

  • Drop metrics we do not need (saves storage)
  • Rename metric labels
  • Remove high-cardinality labels that cause storage bloat
metric_relabel_configs:
- source_labels: [__name__]
regex: "go_.*"
action: drop
- source_labels: [__name__]
regex: "(jvm_.*|mule_.*|http_.*)"
action: keep
- source_labels: [error_code]
regex: ".*"
target_label: error_code
action: labeldrop


Section 5: remote_write

The remote_write section sends metrics to an external long-term storage system. Prometheus stores data locally by default for a limited retention period (15 days by default). For long-term storage, we ship data to systems like Thanos, Cortex, VictoriaMetrics, or Grafana Mimir.
remote_write:
- url: https://metrics-store.internal/api/v1/push
name: long_term_storage
remote_timeout: 30s
send_exemplars: true
send_native_histograms: false

queue_config:
capacity: 10000
max_shards: 200
min_shards: 1
max_samples_per_send: 500
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
retry_on_http_429: true

metadata_config:
send: true
send_interval: 1m
max_samples_per_send: 500

tls_config:
ca_file: /etc/prometheus/certs/ca.crt

basic_auth:
username: prometheus
password_file: /etc/prometheus/remote-write-password.txt

write_relabel_configs:
- source_labels: [environment]
regex: production
action: keep


url

url: https://metrics-store.internal/api/v1/push
The remote write endpoint. All Prometheus-compatible remote storage systems expose a standard endpoint for receiving data.


name

name: long_term_storage
A label applied to internal metrics for this remote write queue. Helps us distinguish between multiple remote write targets in monitoring dashboards.


remote_timeout

remote_timeout: 30s
How long Prometheus waits for the remote storage to accept data before marking the request as failed and retrying.


send_exemplars

send_exemplars: true
Exemplars are sample data points that link a metric to a specific trace ID. They are useful for connecting high latency metrics to distributed traces in tools like Grafana Tempo. Enable this only when our remote storage supports exemplars.


queue_config

queue_config:
  capacity: 10000
max_shards: 200
min_shards: 1
max_samples_per_send: 500
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
retry_on_http_429: true
Prometheus buffers metrics in a queue before sending to remote storage. This section controls that queue.

DirectivePurpose
capacityNumber of samples to buffer per shard. Increase to absorb traffic spikes.
max_shardsMaximum number of parallel write goroutines. Increase for high throughput.
min_shardsMinimum number of parallel write goroutines.
max_samples_per_sendMaximum number of samples in one HTTP request.
batch_send_deadlineMaximum time to wait before sending a partial batch.
min_backoffInitial wait time before retrying a failed send.
max_backoffMaximum wait time between retries.
retry_on_http_429Retry when the remote endpoint returns HTTP 429 (Too Many Requests).
Monitor the prometheus_remote_storage_queue_highest_sent_timestamp_seconds metric. If the queue falls behind, increase max_shardsor capacity.


write_relabel_configs

write_relabel_configs:
- source_labels: [environment]
regex: production
action: keep
Filters which metrics we send to remote storage. Uses the same relabeling syntax as metric_relabel_configs. We use this to send only production metrics to long-term storage while keeping development metrics local.


Section 6: remote_read

The remote_read section tells Prometheus to query an external storage system when a PromQL query requests data outside the local retention window.
remote_read:
- url: https://metrics-store.internal/api/v1/read
name: long_term_read
remote_timeout: 1m
read_recent: false
required_matchers:
environment: production
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
basic_auth:
username: prometheus
password_file: /etc/prometheus/remote-read-password.txt
filter_external_labels: true


read_recent

read_recent: false
When false, Prometheus only queries remote storage for time ranges outside the local retention period. When true, Prometheus always includes remote storage in every query, even for recent data. Leave this as false to avoid unnecessary remote read latency.


required_matchers

required_matchers:
environment: production
Prometheus only sends queries to remote storage when they include these label matchers. This prevents full-scan queries from hitting remote storage when they only target local data.


filter_external_labels

filter_external_labels: true
When true, Prometheus automatically adds the external_labels as matchers when querying remote storage. This ensures we only read data that belongs to this Prometheus instance.


Reloading the Configuration

Every time you modify the prometheus.yaml file you need Prometheus to reload the new configuration. To do that without a restart we can use two methods.


Method 1: HTTP endpoint

curl -X POST http://localhost:9090/-/reload
We must start Prometheus with --web.enable-lifecycle for this to work.


Method 2: SIGHUP signal

kill -HUP $(pgrep prometheus)
Both methods reload prometheus.yml and all rule_files without dropping metrics or interrupting scrapes.


Validating the Configuration

Before we reload or restart, it’s always a good idea to validate our config file to prevent issues. For that, we can use the promtool tool running the command:
promtool check config /etc/prometheus/prometheus.yml
We can also validate rule files with:
promtool check rules /etc/prometheus/rules/mulesoft_alerts.yml
promtool ships with every Prometheus release. We should always run it before applying changes in production.


Summary

The prometheus.yml file controls everything. Here is the full map:

SectionWhat We Define
globalDefault scrape interval, timeout, evaluation interval, and external labels
alertingAlertmanager endpoints, TLS, and authentication
rule_filesPaths to alerting and recording rule files
scrape_configsJobs, targets, paths, auth, TLS, relabeling, and metric filtering
remote_writeExternal long-term storage destination and queue settings
remote_readExternal long-term storage query source
Previous Post Next Post