Master the Prometheus Configuration File

After installing the Prometheus server and setting up one or more nodes with Node Exporter, the next step is to configure Prometheus to actively pull metrics from these targets. Prometheus utilizes a pull-based model, meaning you must explicitly define the targets to scrape in the configuration.

All configurations are made in the prometheus.yaml file, typically located in the /etc/prometheus folder.

In this article, we’ll learn how to configure Prometheus to scrape metrics from multiple nodes effectively.

What Is Prometheus?

Prometheus works on a pull model. Instead of services pushing data to a central server, Prometheus reaches out and scrapes metrics from each target on a schedule.

Each target exposes a /metrics HTTP endpoint (although it can be changed, as we’ll see later). Prometheus calls that endpoint, reads the data, and stores it. We call one scrape cycle a scrape.

The Configuration File: `prometheus.yml`

The Prometheus configuration file is a YAML file. By default, Prometheus looks for it at:

/etc/prometheus/prometheus.yml

We pass a custom path at startup with:

prometheus --config.file=/path/to/prometheus.yml

The file has six top-level sections:

Section	Purpose
`global`	Default settings applied across the entire config
`alerting`	Connection to Alertmanager for alert routing
`rule_files`	Paths to recording and alerting rule files
`scrape_configs`	Defines what Prometheus scrapes and how
`remote_write`	Sends metrics to an external storage backend
`remote_read`	Reads metrics from an external storage backend

We’ll cover every section in full detail below.

Section 1: `global`

The global section sets default values for the entire configuration. Any scrape_configs block can override these values locally.

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
  external_labels:
    environment: production
    region: us-east-1
    team: integration
  query_log_file: /var/log/prometheus/query.log

`scrape_interval`

scrape_interval: 15s

This sets how often Prometheus scrapes each target. Prometheus sends an HTTP GET to the /metrics endpoint every 15 seconds.

Valid time units: ms, s, m, h, d, w, y

Common values:

Value	Use Case
`5s`	High-frequency monitoring. Higher CPU and storage cost.
`15s`	Standard default. Good balance for most services.
`30s`	Lower-cost option for stable or low-priority services.
`60s`	Coarse monitoring for infrastructure-level metrics.

For a MuleSoft Mule Runtime running Standalone, 15s gives us enough resolution to detect thread pool saturation, heap spikes, and flow error rates without overwhelming storage.

`scrape_timeout`

scrape_timeout: 10s

This sets how long Prometheus waits for a target to respond before marking the scrape as failed. The timeout must always be less than or equal to scrape_interval. If the target does not respond within 10 seconds, Prometheus records a scrape failure.

A failed scrape appears in the up metric with a value of 0. We can alert on up == 0 to detect dead targets.

`evaluation_interval`

evaluation_interval: 15s

This sets how often Prometheus evaluates our alerting and recording rules. Every 15 seconds, Prometheus runs every rule in our rule_files and checks whether any alert conditions are true.
Keep evaluation_interval equal to or less than scrape_interval. Evaluating rules more frequently than we collect data produces no benefit.

`external_labels`

external_labels:
  environment: production
  region: us-east-1
  team: integration

External labels attach to every time series and alert that leaves this Prometheus instance. They add context when we ship data to remote storage or when Alertmanager receives alerts.
We use external labels to identify which Prometheus instance produced the data. This is critical in multi-region or multi-environment setups.
For MuleSoft teams, useful external labels might include:

external_labels:
  environment: production
  mule_runtime_version: "4.6"
  business_unit: payments
  datacenter: aws-us-east-1

`query_log_file`

query_log_file: /var/log/prometheus/query.log

This tells Prometheus to log every PromQL query to a file. Each entry records the query text, duration, and timestamps.
We use this to debug slow queries and audit who is querying what. This option is optional. We usually omit it in development.

Section 2: `alerting`

The alerting section connects Prometheus to one or more Alertmanager instances. Alertmanager handles deduplication, grouping, silencing, and routing of alerts to notification channels like PagerDuty, Slack, or email.

alerting:
  alert_relabel_configs:
    - source_labels: [environment]
      target_label: env
      replacement: "$1"
  alertmanagers:
    - scheme: http
      timeout: 10s
      api_version: v2
      path_prefix: /
      static_configs:
        - targets:
            - alertmanager-01:9093
            - alertmanager-02:9093
      tls_config:
        ca_file: /etc/prometheus/certs/ca.crt
        cert_file: /etc/prometheus/certs/client.crt
        key_file: /etc/prometheus/certs/client.key
        insecure_skip_verify: false
      basic_auth:
        username: prometheus
        password: secret
      authorization:
        type: Bearer
        credentials: my-token

`alert_relabel_configs`

alert_relabel_configs:
  - source_labels: [environment]
    target_label: env
    replacement: "$1"

This rewrites labels on alerts before Prometheus sends them to Alertmanager. We use it to normalize label names, drop sensitive labels, or rename inconsistent labels from different targets.
This uses the same relabeling syntax as metric_relabel_configs, which we cover in depth in the scrape_configs section.

`alertmanagers`

This block defines the Alertmanager instances Prometheus sends alerts to.

`scheme`

scheme: http

The protocol used to connect to Alertmanager. Use http or https. Always use https in production.

`timeout`

timeout: 10s

How long Prometheus waits for Alertmanager to accept an alert before giving up.

`api_version`

api_version: v2

The Alertmanager API version. Use v2 for all modern Alertmanager deployments (version 0.16+). The v1 API is deprecated.

`path_prefix`

path_prefix: /

A URL path prefix prepended to all Alertmanager API paths. Use this when Alertmanager sits behind a reverse proxy that adds a base path, such as /alertmanager/.

`static_configs`

static_configs:
  - targets:
      - alertmanager-01:9093
      - alertmanager-02:9093

Lists the Alertmanager instances by hostname and port. We list multiple instances for high availability. Prometheus sends alerts to all of them.

`tls_config`

tls_config:
  ca_file: /etc/prometheus/certs/ca.crt
  cert_file: /etc/prometheus/certs/client.crt
  key_file: /etc/prometheus/certs/client.key
  insecure_skip_verify: false

Configures TLS for the connection to Alertmanager.

Directive	Purpose
`ca_file`	Path to the CA certificate that signed the Alertmanager server cert
`cert_file`	Path to the client certificate for mutual TLS
`key_file`	Path to the private key for the client certificate
`insecure_skip_verify`	Set to `true` to skip certificate validation. Never use in production.

`basic_auth` and `authorization`

basic_auth:
  username: prometheus
  password: secret
authorization:
  type: Bearer
  credentials: my-token

Use one of these to authenticate to Alertmanager. Use basic_auth for username/password. Use authorization for bearer token authentication. Do not use both at the same time.

For production, store credentials in a file and reference it:

basic_auth:
  username: prometheus
  password_file: /etc/prometheus/alertmanager-password.txt

Section 3: `rule_files`

The rule_files section lists paths to files that contain recording rules and alerting rules.

rule_files:
  - /etc/prometheus/rules/alerts.yml
  - /etc/prometheus/rules/recording_rules.yml
  - /etc/prometheus/rules/mulesoft_*.yml

Prometheus accepts glob patterns. Every file matching the pattern loads at startup and reloads when we send SIGHUP or call the /-/reload endpoint.

Recording rules pre-compute expensive PromQL expressions and store the result as a new metric. We use them to reduce query time in dashboards.
Alerting rules define conditions that trigger alerts. When the condition is true for a defined duration, Prometheus fires the alert to Alertmanager.

A rule file looks like this:

groups:
  - name: mulesoft_alerts
    interval: 15s
    rules:
      - alert: MuleRuntimeDown
        expr: up{job="mule_runtime"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Mule Runtime is down"
          description: "Instance {{ $labels.instance }} has been down for more than 1 minute."

We’ll discuss rule files in depth in a future post. For now, we focus on referencing them correctly in prometheus.yml.

Section 4: `scrape_configs`

The scrape_configs section is the heart of Prometheus. It defines every target Prometheus monitors.
Each entry in the list is a scrape job. A job groups related targets together. Prometheus assigns the job label to every metric it collects from that group.

scrape_configs:
  - job_name: mule_runtime
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: http
    honor_labels: false
    honor_timestamps: true
    params:
      format: [prometheus]
    basic_auth:
      username: monitor
      password_file: /etc/prometheus/mule-password.txt
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      insecure_skip_verify: false
    static_configs:
      - targets:
          - mule-server-01:8081
          - mule-server-02:8081
        labels:
          environment: production
          datacenter: us-east
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: jvm_.*
        action: keep

We now break down every directive in a scrape job.

`job_name`

job_name: mule_runtime

A unique name for this scrape job. Prometheus attaches a job label with this value to every metric from this job. We use it to filter metrics in PromQL:

up{job="mule_runtime"}

Use descriptive, lowercase names with underscores. Good examples: mule_runtime, mule_healthcheck, postgres, nginx.

`scrape_interval` and `scrape_timeout` (per-job)

scrape_interval: 15s
scrape_timeout: 10s

These override the global values for this job only. We increase the interval for low-priority jobs. We decrease it for high-priority jobs where we need faster detection.

`metrics_path`

metrics_path: /metrics

The HTTP path Prometheus appends to the target address to collect metrics. The default is /metrics. Change it if our service exposes metrics on a different path.

`scheme`

scheme: http

The protocol used for scraping. Use http or https. Use https in production when the target supports it.

`honor_labels`

honor_labels: false

This controls what happens when a scraped metric already contains a label that Prometheus would add (like job or instance).

Value	Behavior
`false` (default)	Prometheus overwrites labels from the target with its own labels. Safer.
TRUE	Prometheus keeps the labels from the target as-is. Use with federation or push gateways.

We always leave this as false for direct scraping of services.

`honor_timestamps`

honor_timestamps: true

When true, Prometheus respects timestamps embedded in the scraped metrics. When false, Prometheus ignores target timestamps and uses its own scrape time.
Use true only when our metrics source provides precise timestamps (like the Pushgateway). For most live services, use false or omit the directive.

`params`

params:
  format: [prometheus]

Appends URL query parameters to every scrape request. We use this when a target requires query parameters to return Prometheus-formatted output.
For example, some JMX exporters require:

params:
  target: [jmx-service:9999]
  module: [jmx_mule]

`basic_auth`

basic_auth:
  username: monitor
  password_file: /etc/prometheus/mule-password.txt

Sets HTTP Basic Authentication credentials for scraping a protected endpoint. Always use password_file instead of passwordin production. This prevents credentials from appearing in the config file.

`authorization`

authorization:
  type: Bearer
  credentials_file: /etc/prometheus/token.txt

Sets an HTTP Authorization header on every scrape request. Use this for token-based auth. The header value becomes Bearer <token>.

`tls_config` (per-job)

tls_config:
  ca_file: /etc/prometheus/certs/ca.crt
  cert_file: /etc/prometheus/certs/client.crt
  key_file: /etc/prometheus/certs/client.key
  server_name: mule-server-01.internal
  insecure_skip_verify: false

Directive	Purpose
`ca_file`	CA certificate to verify the server's TLS certificate
`cert_file`	Client certificate for mutual TLS
`key_file`	Private key for the client certificate
`server_name`	Override the server name used for TLS verification. Useful when the hostname in the cert differs from the target address.
`insecure_skip_verify`	Skip TLS verification. Never use in production.

`static_configs`

static_configs:
  - targets:
      - mule-server-01:8081
      - mule-server-02:8081
    labels:
      environment: production
      datacenter: us-east

Lists static scrape targets. Each target is a host:port string. Prometheus scrapes the metrics_path on each target.
We attach custom labels here. These labels appear on every metric scraped from these targets. They are useful for grouping targets by environment, datacenter, or team.

Service Discovery Alternatives to `static_configs`

static_configs works for small, stable environments. For dynamic infrastructure, Prometheus supports many service discovery mechanisms that automatically find targets.

Discovery Type	Use Case
`file_sd_configs`	Read targets from a JSON or YAML file. We update the file; Prometheus auto-reloads.
`consul_sd_configs`	Discover services registered in HashiCorp Consul.
`ec2_sd_configs`	Discover AWS EC2 instances automatically.
`kubernetes_sd_configs`	Discover Kubernetes pods, nodes, and services.
`dns_sd_configs`	Discover targets through DNS SRV or A record lookups.
`http_sd_configs`	Fetch a target list from an HTTP endpoint.

For example, for a MuleSoft deployment on AWS, ec2_sd_configs automatically finds all Mule Runtime EC2 instances tagged with a specific key:

- job_name: mule_runtime
  ec2_sd_configs:
    - region: us-east-1
      port: 8081
      filters:
        - name: tag:Role
          values:
            - mule-runtime

`r`elabel_configs

Relabeling transforms labels on scraped targets before the scrape happens. It runs before we connect to the target.
We use relabel_configs to:

Set the instance label to a human-readable value
Filter which targets to scrape
Extract information from labels and store it in a new label

relabel_configs:
  - source_labels: [__address__]
    regex: "(.+):\\d+"
    target_label: host
    replacement: "$1"
  - source_labels: [__meta_ec2_tag_Name]
    target_label: instance
  - source_labels: [__meta_ec2_tag_Environment]
    regex: production
    action: keep

Relabeling uses five core directives:

Directive	Purpose
`source_labels`	One or more label names to read. Multiple values join with a separator.
`separator`	Character that joins multiple `source_labels` values. Default: `;`
`target_label`	The label to write the result into.
`regex`	A regular expression applied to the joined source value. Default: `(.*)`
`replacement`	The value written to `target_label`. Use `$1` to reference capture groups. Default: `$1`
`action`	What to do with the match. Default: `replace`

`action` Values

Action	Behavior
`replace`	Replace `target_label` with `replacement`. Default.
`keep`	Keep the target only if `regex` matches. Drop all others.
`drop`	Drop the target if `regex` matches. Keep all others.
`labelkeep`	Keep only labels whose names match `regex`. Drop all others.
`labeldrop`	Drop labels whose names match `regex`. Keep all others.
`labelmap`	Copy labels whose names match `regex` to new labels. Use `replacement` to define the new name.
`hashmod`	Hash the source label and assign a modulo value. Used for horizontal sharding.

Special Labels in Relabeling

Prometheus exposes internal metadata as labels prefixed with __. We use these in relabel_configs:

Label	Value
`__address__`	The target address (`host:port`)
`__scheme__`	The scrape scheme (`http` or `https`)
`__metrics_path__`	The metrics path
`__param_<name>`	URL parameters for the scrape request
`__meta_*`	Metadata labels from service discovery (e.g., `__meta_ec2_tag_Name`)

All __ labels are stripped before metrics are stored. Only labels we explicitly copy to non-__ labels survive.

`metric_relabel_configs`

metric_relabel_configs runs after the scrape. It transforms or filters individual metric series before Prometheus stores them.

We use it to:

Drop metrics we do not need (saves storage)
Rename metric labels
Remove high-cardinality labels that cause storage bloat

metric_relabel_configs:
  - source_labels: [__name__]
    regex: "go_.*"
    action: drop
  - source_labels: [__name__]
    regex: "(jvm_.*|mule_.*|http_.*)"
    action: keep
  - source_labels: [error_code]
    regex: ".*"
    target_label: error_code
    action: labeldrop

Section 5: `remote_write`

The remote_write section sends metrics to an external long-term storage system. Prometheus stores data locally by default for a limited retention period (15 days by default). For long-term storage, we ship data to systems like Thanos, Cortex, VictoriaMetrics, or Grafana Mimir.

remote_write:
  - url: https://metrics-store.internal/api/v1/push
    name: long_term_storage
    remote_timeout: 30s
    send_exemplars: true
    send_native_histograms: false

    queue_config:
      capacity: 10000
      max_shards: 200
      min_shards: 1
      max_samples_per_send: 500
      batch_send_deadline: 5s
      min_backoff: 30ms
      max_backoff: 5s
      retry_on_http_429: true

    metadata_config:
      send: true
      send_interval: 1m
      max_samples_per_send: 500

    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt

    basic_auth:
      username: prometheus
      password_file: /etc/prometheus/remote-write-password.txt

    write_relabel_configs:
      - source_labels: [environment]
        regex: production
        action: keep

`url`

url: https://metrics-store.internal/api/v1/push

The remote write endpoint. All Prometheus-compatible remote storage systems expose a standard endpoint for receiving data.

`name`

name: long_term_storage

A label applied to internal metrics for this remote write queue. Helps us distinguish between multiple remote write targets in monitoring dashboards.

`remote_timeout`

remote_timeout: 30s

How long Prometheus waits for the remote storage to accept data before marking the request as failed and retrying.

`send_exemplars`

send_exemplars: true

Exemplars are sample data points that link a metric to a specific trace ID. They are useful for connecting high latency metrics to distributed traces in tools like Grafana Tempo. Enable this only when our remote storage supports exemplars.

`queue_config`

queue_config:

  capacity: 10000
  max_shards: 200
  min_shards: 1
  max_samples_per_send: 500
  batch_send_deadline: 5s
  min_backoff: 30ms
  max_backoff: 5s
  retry_on_http_429: true

Prometheus buffers metrics in a queue before sending to remote storage. This section controls that queue.

Directive	Purpose
`capacity`	Number of samples to buffer per shard. Increase to absorb traffic spikes.
`max_shards`	Maximum number of parallel write goroutines. Increase for high throughput.
`min_shards`	Minimum number of parallel write goroutines.
`max_samples_per_send`	Maximum number of samples in one HTTP request.
`batch_send_deadline`	Maximum time to wait before sending a partial batch.
`min_backoff`	Initial wait time before retrying a failed send.
`max_backoff`	Maximum wait time between retries.
`retry_on_http_429`	Retry when the remote endpoint returns HTTP 429 (Too Many Requests).

Monitor the prometheus_remote_storage_queue_highest_sent_timestamp_seconds metric. If the queue falls behind, increase max_shardsor capacity.

`write_relabel_configs`

write_relabel_configs:
  - source_labels: [environment]
    regex: production
    action: keep

Filters which metrics we send to remote storage. Uses the same relabeling syntax as metric_relabel_configs. We use this to send only production metrics to long-term storage while keeping development metrics local.

Section 6: `remote_read`

The remote_read section tells Prometheus to query an external storage system when a PromQL query requests data outside the local retention window.

remote_read:
  - url: https://metrics-store.internal/api/v1/read
    name: long_term_read
    remote_timeout: 1m
    read_recent: false
    required_matchers:
      environment: production
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
    basic_auth:
      username: prometheus
      password_file: /etc/prometheus/remote-read-password.txt
    filter_external_labels: true

`read_recent`

read_recent: false

When false, Prometheus only queries remote storage for time ranges outside the local retention period. When true, Prometheus always includes remote storage in every query, even for recent data. Leave this as false to avoid unnecessary remote read latency.

`required_matchers`

required_matchers:
  environment: production

Prometheus only sends queries to remote storage when they include these label matchers. This prevents full-scan queries from hitting remote storage when they only target local data.

`filter_external_labels`

filter_external_labels: true

When true, Prometheus automatically adds the external_labels as matchers when querying remote storage. This ensures we only read data that belongs to this Prometheus instance.

Reloading the Configuration

Every time you modify the prometheus.yaml file you need Prometheus to reload the new configuration. To do that without a restart we can use two methods.

Method 1: HTTP endpoint

curl -X POST http://localhost:9090/-/reload

We must start Prometheus with --web.enable-lifecycle for this to work.

Method 2: SIGHUP signal

kill -HUP $(pgrep prometheus)

Both methods reload prometheus.yml and all rule_files without dropping metrics or interrupting scrapes.

Validating the Configuration

Before we reload or restart, it’s always a good idea to validate our config file to prevent issues. For that, we can use the promtool tool running the command:

promtool check config /etc/prometheus/prometheus.yml

We can also validate rule files with:

promtool check rules /etc/prometheus/rules/mulesoft_alerts.yml

promtool ships with every Prometheus release. We should always run it before applying changes in production.

Summary

The prometheus.yml file controls everything. Here is the full map:

Section	What We Define
`global`	Default scrape interval, timeout, evaluation interval, and external labels
`alerting`	Alertmanager endpoints, TLS, and authentication
`rule_files`	Paths to alerting and recording rule files
`scrape_configs`	Jobs, targets, paths, auth, TLS, relabeling, and metric filtering
`remote_write`	External long-term storage destination and queue settings
`remote_read`	External long-term storage query source

Master the Prometheus Configuration File

What Is Prometheus?

The Configuration File: prometheus.yml

Section 1: global

scrape_interval

scrape_timeout

evaluation_interval

external_labels

query_log_file

Section 2: alerting

alert_relabel_configs

alertmanagers

scheme

timeout

api_version

path_prefix

static_configs

tls_config

basic_auth and authorization

Section 3: rule_files

Section 4: scrape_configs

job_name

scrape_interval and scrape_timeout (per-job)

metrics_path

scheme

honor_labels

honor_timestamps

params

basic_auth

authorization

tls_config (per-job)

static_configs

Service Discovery Alternatives to static_configs

relabel_configs

action Values

Special Labels in Relabeling

metric_relabel_configs

We use it to:

Section 5: remote_write

url

name

remote_timeout

send_exemplars

queue_config

write_relabel_configs

Section 6: remote_read

read_recent

required_matchers

filter_external_labels

Reloading the Configuration

Method 1: HTTP endpoint

Method 2: SIGHUP signal

Validating the Configuration

Summary

Contact Form

The Configuration File: `prometheus.yml`

Section 1: `global`

`scrape_interval`

`scrape_timeout`

`evaluation_interval`

`external_labels`

`query_log_file`

Section 2: `alerting`

`alert_relabel_configs`

`alertmanagers`

`scheme`

`timeout`

`api_version`

`path_prefix`

`static_configs`

`tls_config`

`basic_auth` and `authorization`

Section 3: `rule_files`

Section 4: `scrape_configs`

`job_name`

`scrape_interval` and `scrape_timeout` (per-job)

`metrics_path`

`scheme`

`honor_labels`

`honor_timestamps`

`params`

`basic_auth`

`authorization`

`tls_config` (per-job)

`static_configs`

Service Discovery Alternatives to `static_configs`

`r`elabel_configs

`action` Values

`metric_relabel_configs`

Section 5: `remote_write`

`url`

`name`

`remote_timeout`

`send_exemplars`

`queue_config`

`write_relabel_configs`

Section 6: `remote_read`

`read_recent`

`required_matchers`

`filter_external_labels`