After installing the Prometheus server and setting up one or more nodes with Node Exporter, the next step is to configure Prometheus to actively pull metrics from these targets. Prometheus utilizes a pull-based model, meaning you must explicitly define the targets to scrape in the configuration.
All configurations are made in the prometheus.yaml file, typically located in the /etc/prometheus folder.In this article, we’ll learn how to configure Prometheus to scrape metrics from multiple nodes effectively.
What Is Prometheus?
Prometheus works on a pull model. Instead of services pushing data to a central server, Prometheus reaches out and scrapes metrics from each target on a schedule.Each target exposes a
/metrics HTTP endpoint (although it can be changed, as we’ll see later). Prometheus calls that endpoint, reads the data, and stores it. We call one scrape cycle a scrape.The Configuration File: prometheus.yml
The Prometheus configuration file is a YAML file. By default, Prometheus looks for it at:/etc/prometheus/prometheus.ymlprometheus --config.file=/path/to/prometheus.yml| Section | Purpose |
|---|---|
global | Default settings applied across the entire config |
alerting | Connection to Alertmanager for alert routing |
rule_files | Paths to recording and alerting rule files |
scrape_configs | Defines what Prometheus scrapes and how |
remote_write | Sends metrics to an external storage backend |
remote_read | Reads metrics from an external storage backend |
Section 1: global
The global section sets default values for the entire configuration. Any scrape_configs block can override these values locally.global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
environment: production
region: us-east-1
team: integration
query_log_file: /var/log/prometheus/query.logscrape_interval
scrape_interval: 15s/metrics endpoint every 15 seconds.Valid time units:
ms, s, m, h, d, w, yCommon values:
| Value | Use Case |
|---|---|
5s | High-frequency monitoring. Higher CPU and storage cost. |
15s | Standard default. Good balance for most services. |
30s | Lower-cost option for stable or low-priority services. |
60s | Coarse monitoring for infrastructure-level metrics. |
For a MuleSoft Mule Runtime running Standalone,
15s gives us enough resolution to detect thread pool saturation, heap spikes, and flow error rates without overwhelming storage.scrape_timeout
scrape_timeout: 10sThis sets how long Prometheus waits for a target to respond before marking the scrape as failed. The timeout must always be less than or equal to scrape_interval. If the target does not respond within 10 seconds, Prometheus records a scrape failure.A failed scrape appears in the
up metric with a value of 0. We can alert on up == 0 to detect dead targets.evaluation_interval
evaluation_interval: 15srule_files and checks whether any alert conditions are true.Keep
evaluation_interval equal to or less than scrape_interval. Evaluating rules more frequently than we collect data produces no benefit.external_labels
external_labels:
environment: production
region: us-east-1
team: integrationWe use external labels to identify which Prometheus instance produced the data. This is critical in multi-region or multi-environment setups.
For MuleSoft teams, useful external labels might include:
external_labels:
environment: production
mule_runtime_version: "4.6"
business_unit: payments
datacenter: aws-us-east-1query_log_file
query_log_file: /var/log/prometheus/query.logWe use this to debug slow queries and audit who is querying what. This option is optional. We usually omit it in development.
Section 2: alerting
The alerting section connects Prometheus to one or more Alertmanager instances. Alertmanager handles deduplication, grouping, silencing, and routing of alerts to notification channels like PagerDuty, Slack, or email.alerting:
alert_relabel_configs:
- source_labels: [environment]
target_label: env
replacement: "$1"
alertmanagers:
- scheme: http
timeout: 10s
api_version: v2
path_prefix: /
static_configs:
- targets:
- alertmanager-01:9093
- alertmanager-02:9093
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
insecure_skip_verify: false
basic_auth:
username: prometheus
password: secret
authorization:
type: Bearer
credentials: my-tokenalert_relabel_configs
alert_relabel_configs:
- source_labels: [environment]
target_label: env
replacement: "$1"This uses the same relabeling syntax as
metric_relabel_configs, which we cover in depth in the scrape_configs section.alertmanagers
This block defines the Alertmanager instances Prometheus sends alerts to.scheme
scheme: httpThe protocol used to connect to Alertmanager. Use http or https. Always use https in production.timeout
timeout: 10sHow long Prometheus waits for Alertmanager to accept an alert before giving up.api_version
api_version: v2v2 for all modern Alertmanager deployments (version 0.16+). The v1 API is deprecated.path_prefix
path_prefix: /A URL path prefix prepended to all Alertmanager API paths. Use this when Alertmanager sits behind a reverse proxy that adds a base path, such as
/alertmanager/.static_configs
static_configs:
- targets:
- alertmanager-01:9093
- alertmanager-02:9093Lists the Alertmanager instances by hostname and port. We list multiple instances for high availability. Prometheus sends alerts to all of them.
tls_config
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
insecure_skip_verify: falseConfigures TLS for the connection to Alertmanager.| Directive | Purpose |
|---|---|
ca_file | Path to the CA certificate that signed the Alertmanager server cert |
cert_file | Path to the client certificate for mutual TLS |
key_file | Path to the private key for the client certificate |
insecure_skip_verify | Set to true to skip certificate validation. Never use in production. |
basic_auth and authorization
basic_auth:
username: prometheus
password: secret
authorization:
type: Bearer
credentials: my-tokenUse one of these to authenticate to Alertmanager. Use basic_auth for username/password. Use authorization for bearer token authentication. Do not use both at the same time.For production, store credentials in a file and reference it:
Section 3:
The
Prometheus accepts glob patterns. Every file matching the pattern loads at startup and reloads when we send
basic_auth:
username: prometheus
password_file: /etc/prometheus/alertmanager-password.txtSection 3: rule_files
The rule_files section lists paths to files that contain recording rules and alerting rules.rule_files:
- /etc/prometheus/rules/alerts.yml
- /etc/prometheus/rules/recording_rules.yml
- /etc/prometheus/rules/mulesoft_*.ymlSIGHUP or call the /-/reload endpoint.Recording rules pre-compute expensive PromQL expressions and store the result as a new metric. We use them to reduce query time in dashboards.
Alerting rules define conditions that trigger alerts. When the condition is true for a defined duration, Prometheus fires the alert to Alertmanager.
Alerting rules define conditions that trigger alerts. When the condition is true for a defined duration, Prometheus fires the alert to Alertmanager.
A rule file looks like this:
We’ll discuss rule files in depth in a future post. For now, we focus on referencing them correctly in
Section 4:
The
Each entry in the list is a scrape job. A job groups related targets together. Prometheus assigns the
groups:
- name: mulesoft_alerts
interval: 15s
rules:
- alert: MuleRuntimeDown
expr: up{job="mule_runtime"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Mule Runtime is down"
description: "Instance {{ $labels.instance }} has been down for more than 1 minute."prometheus.yml.Section 4: scrape_configs
The scrape_configs section is the heart of Prometheus. It defines every target Prometheus monitors.Each entry in the list is a scrape job. A job groups related targets together. Prometheus assigns the
job label to every metric it collects from that group.scrape_configs:
- job_name: mule_runtime
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
honor_labels: false
honor_timestamps: true
params:
format: [prometheus]
basic_auth:
username: monitor
password_file: /etc/prometheus/mule-password.txt
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
insecure_skip_verify: false
static_configs:
- targets:
- mule-server-01:8081
- mule-server-02:8081
labels:
environment: production
datacenter: us-east
relabel_configs:
- source_labels: [__address__]
target_label: instance
metric_relabel_configs:
- source_labels: [__name__]
regex: jvm_.*
action: keepWe now break down every directive in a scrape job.job_name
job_name: mule_runtimeA unique name for this scrape job. Prometheus attaches a job label with this value to every metric from this job. We use it to filter metrics in PromQL:up{job="mule_runtime"}Use descriptive, lowercase names with underscores. Good examples: mule_runtime, mule_healthcheck, postgres, nginx.scrape_interval and scrape_timeout (per-job)
scrape_interval: 15s
scrape_timeout: 10sThese override the global values for this job only. We increase the interval for low-priority jobs. We decrease it for high-priority jobs where we need faster detection.metrics_path
metrics_path: /metricsThe HTTP path Prometheus appends to the target address to collect metrics. The default is /metrics. Change it if our service exposes metrics on a different path.scheme
scheme: httpThe protocol used for scraping. Use http or https. Use https in production when the target supports it.honor_labels
honor_labels: falseThis controls what happens when a scraped metric already contains a label that Prometheus would add (like job or instance).| Value | Behavior |
|---|---|
false (default) | Prometheus overwrites labels from the target with its own labels. Safer. |
| TRUE | Prometheus keeps the labels from the target as-is. Use with federation or push gateways. |
We always leave this as
Use
For example, some JMX exporters require:
We attach custom labels here. These labels appear on every metric scraped from these targets. They are useful for grouping targets by environment, datacenter, or team.
Service Discovery Alternatives to
false for direct scraping of services.honor_timestamps
honor_timestamps: trueWhen true, Prometheus respects timestamps embedded in the scraped metrics. When false, Prometheus ignores target timestamps and uses its own scrape time.Use
true only when our metrics source provides precise timestamps (like the Pushgateway). For most live services, use false or omit the directive.params
params:
format: [prometheus]Appends URL query parameters to every scrape request. We use this when a target requires query parameters to return Prometheus-formatted output.For example, some JMX exporters require:
params:
target: [jmx-service:9999]
module: [jmx_mule]basic_auth
basic_auth:
username: monitor
password_file: /etc/prometheus/mule-password.txtSets HTTP Basic Authentication credentials for scraping a protected endpoint. Always use password_file instead of passwordin production. This prevents credentials from appearing in the config file.authorization
authorization:
type: Bearer
credentials_file: /etc/prometheus/token.txtSets an HTTP Authorization header on every scrape request. Use this for token-based auth. The header value becomes Bearer <token>.tls_config (per-job)
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
cert_file: /etc/prometheus/certs/client.crt
key_file: /etc/prometheus/certs/client.key
server_name: mule-server-01.internal
insecure_skip_verify: false| Directive | Purpose |
|---|---|
ca_file | CA certificate to verify the server's TLS certificate |
cert_file | Client certificate for mutual TLS |
key_file | Private key for the client certificate |
server_name | Override the server name used for TLS verification. Useful when the hostname in the cert differs from the target address. |
insecure_skip_verify | Skip TLS verification. Never use in production. |
static_configs
static_configs:
- targets:
- mule-server-01:8081
- mule-server-02:8081
labels:
environment: production
datacenter: us-eastLists static scrape targets. Each target is a host:port string. Prometheus scrapes the metrics_path on each target.We attach custom labels here. These labels appear on every metric scraped from these targets. They are useful for grouping targets by environment, datacenter, or team.
Service Discovery Alternatives to static_configs
static_configs works for small, stable environments. For dynamic infrastructure, Prometheus supports many service discovery mechanisms that automatically find targets.| Discovery Type | Use Case |
|---|---|
file_sd_configs | Read targets from a JSON or YAML file. We update the file; Prometheus auto-reloads. |
consul_sd_configs | Discover services registered in HashiCorp Consul. |
ec2_sd_configs | Discover AWS EC2 instances automatically. |
kubernetes_sd_configs | Discover Kubernetes pods, nodes, and services. |
dns_sd_configs | Discover targets through DNS SRV or A record lookups. |
http_sd_configs | Fetch a target list from an HTTP endpoint. |
ec2_sd_configs automatically finds all Mule Runtime EC2 instances tagged with a specific key:- job_name: mule_runtime
ec2_sd_configs:
- region: us-east-1
port: 8081
filters:
- name: tag:Role
values:
- mule-runtimerelabel_configs
Relabeling transforms labels on scraped targets before the scrape happens. It runs before we connect to the target.We use
relabel_configs to:- Set the
instancelabel to a human-readable value - Filter which targets to scrape
- Extract information from labels and store it in a new label
relabel_configs:
- source_labels: [__address__]
regex: "(.+):\\d+"
target_label: host
replacement: "$1"
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
- source_labels: [__meta_ec2_tag_Environment]
regex: production
action: keep| Directive | Purpose |
|---|---|
source_labels | One or more label names to read. Multiple values join with a separator. |
separator | Character that joins multiple source_labels values. Default: ; |
target_label | The label to write the result into. |
regex | A regular expression applied to the joined source value. Default: (.*) |
replacement | The value written to target_label. Use $1 to reference capture groups. Default: $1 |
action | What to do with the match. Default: replace |
action Values
| Action | Behavior |
|---|---|
replace | Replace target_label with replacement. Default. |
keep | Keep the target only if regex matches. Drop all others. |
drop | Drop the target if regex matches. Keep all others. |
labelkeep | Keep only labels whose names match regex. Drop all others. |
labeldrop | Drop labels whose names match regex. Keep all others. |
labelmap | Copy labels whose names match regex to new labels. Use replacement to define the new name. |
hashmod | Hash the source label and assign a modulo value. Used for horizontal sharding. |
Special Labels in Relabeling
Prometheus exposes internal metadata as labels prefixed with__. We use these in relabel_configs:| Label | Value |
|---|---|
__address__ | The target address (host:port) |
__scheme__ | The scrape scheme (http or https) |
__metrics_path__ | The metrics path |
__param_<name> | URL parameters for the scrape request |
__meta_* | Metadata labels from service discovery (e.g., __meta_ec2_tag_Name) |
All
Section 5:
The
__ labels are stripped before metrics are stored. Only labels we explicitly copy to non-__ labels survive.metric_relabel_configs
metric_relabel_configs runs after the scrape. It transforms or filters individual metric series before Prometheus stores them.We use it to:
- Drop metrics we do not need (saves storage)
- Rename metric labels
- Remove high-cardinality labels that cause storage bloat
metric_relabel_configs:
- source_labels: [__name__]
regex: "go_.*"
action: drop
- source_labels: [__name__]
regex: "(jvm_.*|mule_.*|http_.*)"
action: keep
- source_labels: [error_code]
regex: ".*"
target_label: error_code
action: labeldropSection 5: remote_write
The remote_write section sends metrics to an external long-term storage system. Prometheus stores data locally by default for a limited retention period (15 days by default). For long-term storage, we ship data to systems like Thanos, Cortex, VictoriaMetrics, or Grafana Mimir.remote_write:
- url: https://metrics-store.internal/api/v1/push
name: long_term_storage
remote_timeout: 30s
send_exemplars: true
send_native_histograms: false
queue_config:
capacity: 10000
max_shards: 200
min_shards: 1
max_samples_per_send: 500
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
retry_on_http_429: true
metadata_config:
send: true
send_interval: 1m
max_samples_per_send: 500
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
basic_auth:
username: prometheus
password_file: /etc/prometheus/remote-write-password.txt
write_relabel_configs:
- source_labels: [environment]
regex: production
action: keepurl
url: https://metrics-store.internal/api/v1/pushThe remote write endpoint. All Prometheus-compatible remote storage systems expose a standard endpoint for receiving data.name
name: long_term_storageA label applied to internal metrics for this remote write queue. Helps us distinguish between multiple remote write targets in monitoring dashboards.remote_timeout
remote_timeout: 30sHow long Prometheus waits for the remote storage to accept data before marking the request as failed and retrying.send_exemplars
send_exemplars: trueExemplars are sample data points that link a metric to a specific trace ID. They are useful for connecting high latency metrics to distributed traces in tools like Grafana Tempo. Enable this only when our remote storage supports exemplars.queue_config
queue_config: capacity: 10000
max_shards: 200
min_shards: 1
max_samples_per_send: 500
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
retry_on_http_429: truePrometheus buffers metrics in a queue before sending to remote storage. This section controls that queue.| Directive | Purpose |
|---|---|
capacity | Number of samples to buffer per shard. Increase to absorb traffic spikes. |
max_shards | Maximum number of parallel write goroutines. Increase for high throughput. |
min_shards | Minimum number of parallel write goroutines. |
max_samples_per_send | Maximum number of samples in one HTTP request. |
batch_send_deadline | Maximum time to wait before sending a partial batch. |
min_backoff | Initial wait time before retrying a failed send. |
max_backoff | Maximum wait time between retries. |
retry_on_http_429 | Retry when the remote endpoint returns HTTP 429 (Too Many Requests). |
prometheus_remote_storage_queue_highest_sent_timestamp_seconds metric. If the queue falls behind, increase max_shardsor capacity.write_relabel_configs
write_relabel_configs:
- source_labels: [environment]
regex: production
action: keepFilters which metrics we send to remote storage. Uses the same relabeling syntax as metric_relabel_configs. We use this to send only production metrics to long-term storage while keeping development metrics local.Section 6: remote_read
The remote_read section tells Prometheus to query an external storage system when a PromQL query requests data outside the local retention window.remote_read:
- url: https://metrics-store.internal/api/v1/read
name: long_term_read
remote_timeout: 1m
read_recent: false
required_matchers:
environment: production
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
basic_auth:
username: prometheus
password_file: /etc/prometheus/remote-read-password.txt
filter_external_labels: trueread_recent
read_recent: falseWhen false, Prometheus only queries remote storage for time ranges outside the local retention period. When true, Prometheus always includes remote storage in every query, even for recent data. Leave this as false to avoid unnecessary remote read latency.required_matchers
required_matchers:
environment: productionPrometheus only sends queries to remote storage when they include these label matchers. This prevents full-scan queries from hitting remote storage when they only target local data.filter_external_labels
filter_external_labels: trueWhen true, Prometheus automatically adds the external_labels as matchers when querying remote storage. This ensures we only read data that belongs to this Prometheus instance.Reloading the Configuration
Every time you modify the prometheus.yaml file you need Prometheus to reload the new configuration. To do that without a restart we can use two methods.Method 1: HTTP endpoint
curl -X POST http://localhost:9090/-/reloadWe must start Prometheus with --web.enable-lifecycle for this to work.Method 2: SIGHUP signal
kill -HUP $(pgrep prometheus)Both methods reload prometheus.yml and all rule_files without dropping metrics or interrupting scrapes.Validating the Configuration
Before we reload or restart, it’s always a good idea to validate our config file to prevent issues. For that, we can use thepromtool tool running the command:promtool check config /etc/prometheus/prometheus.ymlWe can also validate rule files with:promtool check rules /etc/prometheus/rules/mulesoft_alerts.ymlpromtool ships with every Prometheus release. We should always run it before applying changes in production.Summary
Theprometheus.yml file controls everything. Here is the full map:| Section | What We Define |
|---|---|
global | Default scrape interval, timeout, evaluation interval, and external labels |
alerting | Alertmanager endpoints, TLS, and authentication |
rule_files | Paths to alerting and recording rule files |
scrape_configs | Jobs, targets, paths, auth, TLS, relabeling, and metric filtering |
remote_write | External long-term storage destination and queue settings |
remote_read | External long-term storage query source |