Understanting the CNCF DoS Attack Tree in Kubernetes

A Denial of Service (DoS) attack is a deliberate attempt to block access to a system or service. It works by overwhelming the target with traffic or resource-heavy tasks. The goal is to slow down or crash applications, making them unusable. For Kubernetes (K8s), these attacks can bring our clusters to a halt. This affects everything running on them—from critical apps to internal tools.

That’s why when we are building our Threat Model for Kubernetes, it’s very important to understand how an attacker could perform a DoS attack against our K8s cluster - what are the vulnerabilities that an attacker could exploit in our K8s cluster that might allow an attacker to disrupt our cluster. allo

To understand in depth, a great tool that we can use is the Attack Tree for DoS built by the CNCF Financial User Group. The Attack tree is a graphical representation of the different steps that an attacker could perform in a K8s to carry out a DoS attack.

As we learnt in our previous post, the DoS Attack Tree follows a bottom-up approach, meaning, the attack tree will show the different approaches that an attacker could take to do a DoS starting by exploiting an existing vulnerability in our cluster. In particular, the attack tree described by the CNCF shows two approaches:

The first one, explores the different steps that an attacker can follow, starting from exploiting a compromised container in our cluster, until they can exhaust compute resources in the cluster
The second one describes how to bring down the different K8s components, and hence disrupting the K8s functioning, when an attacker gets network access to our K8s environment.

Let’s see them in detail

Left Branch - From gaining access to a container

When there’s a compromised container an an attacker can get access to it, there are a few different ways in which the attacker can DoS our environment:

Add Process to running pod

In this scenario, the attackers from the pod add new processes to a running pod, which increases the load and resources consumption (cpu, memory).
By overwhelming the containers with additional load they can make the containers and pods slow down, become unresponsive or even crash. All of that would make our apps become unavailable.
If the container has no resource limits defined (resources.limits in the pod spec), the attacker can:

Run processes that consume 100% of the CPU.
Allocate large amounts of memory (causing OOM kills for other pods).
Write endless data to volumes, consuming storage.

This overloads the node. Other pods on the same node may be evicted or fail to schedule, degrading application availability.

Use Privileged Container to start/modify process on host (Root)

In this scenario, the attacker gains access to a privileged container—one that runs with root permissions and has access to the underlying host system. This level of access opens the door to dangerous capabilities.

The first thing an attacker can do with access to a Privileged Container is to import malicious code. For that, the attacker will try to use built-in tools like package managers (apt, yum), scripting runtimes, or network utilities(curl, wget) to quietly fetch and install malicious software. If these tools are not present in the compromised container the attacker will try to find another workload with these tools. With root privileges, the attacker can move laterally across the cluster. They might scan for other containers or pods and deploy malware into those as well. This multiplies the impact and makes detection more complex, especially when using tools that already exist within the container image.
The threat extends beyond container boundaries. If the attacker discovers containers with host filesystem mounts, they can read and modify files on the node itself. This allows them to tamper with system-level files, such as service unit files or startup scripts. By doing so, they ensure that malicious processes start automaticallywhenever the system reboots—establishing deep persistence.
If the container includes a mounted Docker socket (/var/run/docker.sock), the attacker gains full control over the Docker daemon. They can now bypass Kubernetes entirely, launching or altering containers directly through Docker commands
Lastly, if the attacker finds a container with HostPID then it can mess up processes in the host. Let´s break it down. First, what is the Host PID Namespace? The PID (Process ID) namespace controls which processes a container can see. By default, a container can only see and interact with its own processes. But if a pod is configured with this setting in its manifest:

spec:
  hostPID: true

This setting shares the host’s PID namespace with the container. Then any process inside the container can see and interact with host processes using tools like ps or kill

ps aux       # shows all host processes
kill -9 PID  # can send signals to host processes

If the container runs with elevated privileges (e.g., CAP_SYS_ADMIN) or as privileged: true, the attacker can break out of the container and access host namespaces—including the PID namespace.

Write New workloads into backend store (etcd)

If the attacker manages to get access to a container running on a master node and this container has privileged access to access the filesystem in the host, they will be able to access the folders where the client certificates for the Kube API server are stored. With these certificates they can authenticate and access the etcd store, which means they can manipulate resources in the cluster. This way, the attacker can bypass K8s management and can cause damage by overloading the cluster with extra workloads that would make the whole K8s cluster unresponsive or modify configurations creating instability, or even accessing sensitive information

Create/Scale deployments using API server

In this scenario, from a compromised container the attacker manages to interact with the Kube API Server with valid credentials. If these credentials have enough privileges then the attacker will be able to create new workloads or scale up the existing ones till the exhaustion of resources in the cluster.

How could this happen?
When Pods contact the API server, Pods authenticate as a particular ServiceAccount (for example, default). By default, every pod in Kubernetes gets a service account token mounted inside the container at:

/var/run/secrets/kubernetes.io/serviceaccount/token

If the container is compromised, the attacker can read this file and then, they will be able to use that token to authenticate directly to the Kubernetes API. From that moment, the attacker could throw our K8s cluster into chaos in different ways:

Exhausting Compute resources - creates new workloads that will end up eating all the resoures. They can modify the existing deployments by adding more replicas until the system can’t handle the excessive demand on the compute resources.
Disrupting the scheduling of new workloads, stopping the cluster to create any new pods

Create Autoscaling event within existing deployment

In this case, the attacker simulates increased load in the replicas of a deployment making it to trigger the alert and autoscale. Autoscaling will add more and more replicas till the exhaustion of resources in the cluster.

Right Branch

Another approach for a DoS attack is to use network attacks to throw our cluster into chaos in those scenarios where the network of our K8s cluster is not properly protected.

The CNCF provides three categories for the network attacks

Exhaust Compute Resources

If the subnet where our K8s cluster is running is not secure enough, an attacker could DoS the Kubelet ports (10255, 10250 and 10248). That would make that worker node not available and, hence, reduce (and eventually exahust) the compute resources in the cluster

Disrupt workloads in the cluster

In a network attack, the attacker could also target the ports of the different cluster components. There are multiple ways to disrupt the workloads in the cluster via a network attack that targets different critical ports of the K8s cluster

Targetting the etcd ports (2379, 2380), attackers might launch a DoS attack overloading that port and making the etcd service not able to get quorum, which will make the cluster not capable of running new workloads
Attacking Kube API server ports (6443 and 8080) will make the API server not able to communicate with the rest of the components in the cluster and then not able to operate
There can be also DoS attacks to the Scheduler component, ports 10251 and 10252, which will bring the scheduler down and therefore preventing any scheduling/rescheduling of workloads
The same for Kube Controller Manager component - An attack to ports 10252 and 10257 will make the cluster lose the ability of running the different controllers and hence not able to watch and control the cluster resources status (deployments, replicaSets, ingresses...)

Disrupting Networking

Another way for a DoS in a compromised network is disrupting the internal and external communication in the K8s cluster.

The attacker could target the kube proxy. Kube proxy is the service that allows communicaiton between nodes in the cluster and between services and pods. If the attacker takes down this service or overloads ports 10256 and 10249 then it will block this communication making our services and pods unavailable.
The attacker could also target port 53, the DNS service and hence break the names resolution, which would make the services not able to find each other
Lastly, an attacker could also flood the Container Network Interface, which will make slow down or stop services in the cluster.

How to mitigate DoS risks

Set resource quotas and limits for each namespace to prevent excessive resource usage. This ensures that even if an attacker compromises one pod, they cannot exhaust all cluster’s resources

Restrict service account permissions to limit potential attacks. For instance we must secure service accounts in our clusters so that if one pod is compromised then from that pod it won’t be possible to create new pods or perform any adminstrative actions within the cluster

Use network policies and firewalls - With network policies we can control the traffic flow within the cluster and with firewall rules we can restrict the control plane and the access to the API server endpoints to only internal and trusted components.

Monitor and alert on unusual activity for quick response - Even if we put in practice all of these security mechanims it’s always a good idea to monitor and audit our K8s to identify and alert any suspicious activity that allows us to quickly respond and mitigate potential threats.