Skip to content

Tier-2 Kernel Enforcement

CloudTaser classifies kernel environments into two enforcement tiers based on eBPF capability. This page documents the Tier-2 threat model, the data-leak window bounds for secondary vectors, active mitigations, and customer deployment guidance.

Research basis

This threat model was researched and validated against live kernels as part of cloudtaser-ebpf#606 (2026-06-04). The worst-case window figures below are derived from measured kernel ring-buffer latency on GKE COS nodes.


Kernel Tier Definitions

Tier-1CONFIG_BPF_KPROBE_OVERRIDE=y available. All attack vectors are denied synchronously before the syscall executes. Zero data-leak window.

Tier-2 — BPF LSM available, CONFIG_BPF_KPROBE_OVERRIDE absent. This is the default for most managed Kubernetes environments: GKE COS, EKS Bottlerocket, standard Debian/Ubuntu kernels.

Tier Kernel capability Examples Primary enforcement
Tier-1 CONFIG_BPF_KPROBE_OVERRIDE=y Custom kernels, Azure Linux 3.0+, EKS AL2023 Synchronous deny — all vectors
Tier-2 BPF LSM, no kprobe override GKE COS, EKS Bottlerocket, Debian 12, Ubuntu 22.04 Synchronous LSM deny (primary) + ReactiveKill (secondary)

Check your kernel tier

# Tier-1: must return "CONFIG_BPF_KPROBE_OVERRIDE=y"
zcat /proc/config.gz | grep CONFIG_BPF_KPROBE_OVERRIDE

# Tier-2 LSM support
zcat /proc/config.gz | grep CONFIG_BPF_LSM

# cloudtaser CLI
cloudtaser-cli status --enforcement-tier

Tier-2 Enforcement Model

On Tier-2 kernels, CloudTaser operates at two enforcement levels.

Synchronous LSM Denial (primary enforcement)

BPF LSM hooks intercept system calls before they execute. No data is read; the attacker receives an error immediately.

Hook Vectors blocked Return value
lsm_ptrace_access_check process_vm_readv, process_vm_writev, ptrace(PTRACE_ATTACH), ptrace(PTRACE_SEIZE) -EPERM
lsm_file_open Opens of /proc/<pid>/environ, /proc/<pid>/mem, /proc/<pid>/maps, /proc/<pid>/pagemap, /proc/<pid>/smaps, /proc/<pid>/stack, /proc/<pid>/syscall, /dev/mem, /dev/kmem, /proc/kcore -EPERM
Attacker calls process_vm_readv(target_pid, ...)
  → lsm_ptrace_access_check BPF hook fires BEFORE syscall executes
  → Hook checks caller against protected cgroup-array
  → Returns -EPERM
  → No memory is read
  → Event logged: VMREADV_DENIED

Primary vectors are zero-data-leak denials

process_vm_readv and /proc/<pid>/mem are the principal memory-read attack vectors. Both are denied synchronously by BPF LSM hooks on Tier-2. The attacker receives an error; no data is transferred.

Asynchronous ReactiveKill (secondary enforcement)

Eight syscall families have no BPF LSM hook in the supported kernel matrix. On Tier-2, the agent attaches via tracepoints, detects the violation at syscall-exit, and sends SIGKILL from userspace. The syscall completes before the signal arrives.

Syscall family Attack scenario Mechanism
io_uring_* Bypasses per-syscall eBPF hooks via submission queue Tracepoint + SIGKILL
userfaultfd Page-fault interception for controlled memory read Tracepoint + SIGKILL
copy_file_range In-kernel zero-copy between files Tracepoint + SIGKILL
kcmp Process comparison to fingerprint targets Tracepoint + SIGKILL
process_madvise Advise changes on another process's memory Tracepoint + SIGKILL
init_module / finit_module Kernel module load for privilege escalation Tracepoint + SIGKILL
setns Namespace escape Tracepoint + SIGKILL
splice / tee / sendfile / vmsplice Zero-copy data exfiltration Tracepoint + SIGKILL

ReactiveKill has a data-leak window

The syscall completes before SIGKILL is delivered. The window is bounded by ring-buffer flush latency + Go goroutine scheduling + signal delivery — measured at approximately 5ms on GKE COS nodes.


Worst-Case Data-Leak Window Analysis

Theoretical maximum

At sendfile(2) throughput (~10 GB/s kernel-to-kernel), a theoretical maximum of ~50 MB could be transferred in one syscall before the SIGKILL arrives.

Practical exposure

The theoretical maximum does not reflect realistic attack conditions:

  1. memfd_secret makes primary vectors unreachable. Secrets in the wrapper are stored in memfd_secret-backed memory (kernel 5.14+). This memory is removed from the kernel direct map — process_vm_readv returns EIO against these pages regardless of LSM enforcement. The LSM hook is defense-in-depth on top of hardware-level isolation.

  2. ReactiveKill vectors are secondary side-channels. The kprobe-only vector families (splice, sendfile, io_uring, etc.) are not direct memory-read paths. They require the attacker to already have data in a buffer they control — exfiltrating wrapper heap requires the primary vectors first, which LSM blocks.

  3. Heap zeroing closes the transient plaintext window. Wrapper v0.2+ zeros Go heap copies of secrets immediately after use. A successful read of wrapper heap during the fetch window returns zeroed or partial plaintext.

  4. Seccomp-bpf blocks vector creation at pod admission. The operator injects a RuntimeDefault seccomp profile that blocks process_vm_readv, pidfd_getfd, and userfaultfd at the kernel syscall filter level for attacker pods.

Window summary

Scenario Data-leak window Notes
Primary vectors (process_vm_readv, /proc/mem) None — synchronous LSM denial BPF LSM hook fires before syscall
Secondary vectors (splice, sendfile, io_uring) ~5ms ReactiveKill window memfd_secret pages still inaccessible inside window
Theoretical worst case (no mitigations) ~50 MB per call Not achievable: memfd + seccomp + heap-zeroing

Active Mitigations on Tier-2

Mitigation Status Effect
memfd_secret (kernel 5.14+) Active by default Hardware-level page hiding; process_vm_readv returns EIO against secret pages
Heap zeroing Active (wrapper v0.2+) Transient Go heap copies zeroed after use; reduces plaintext exposure during fetch window
Seccomp-bpf (RuntimeDefault) Active via operator injection Blocks process_vm_readv, pidfd_getfd, userfaultfd on attacker pods at admission
BPF LSM (primary enforcement) Active on Tier-2 Synchronous denial of all primary memory-read vectors
ReactiveKill (secondary enforcement) Active on Tier-2 ~5ms kill window for secondary vectors
Kernel upgrade to Tier-1 Optional Restores synchronous denial for all vectors including secondary families

Customer Guidance

Default (GKE, EKS, AKS on standard kernels)

Tier-2 enforcement provides synchronous denial for all primary secret-access vectors. The ~5ms ReactiveKill window applies only to secondary side-channel vectors that are not direct memory-read paths. Combined with memfd_secret and heap zeroing, this provides strong practical protection for regulated workloads.

Zero-tolerance posture

For workloads requiring synchronous denial on every vector without exception:

  1. Deploy on Tier-1 kernels (custom kernel build with CONFIG_BPF_KPROBE_OVERRIDE=y, or Azure Linux 3.0+ / EKS AL2023)
  2. Verify the enforcement tier: cloudtaser-cli status --enforcement-tier
  3. Confirm the agent reports enforcement_mode: full on its /status endpoint

Verify your enforcement tier

# Check agent enforcement mode
kubectl exec -n cloudtaser ds/cloudtaser-ebpf -- \
  wget -qO- http://localhost:8080/status | jq .enforcement_mode

# Expected on Tier-1: "full"
# Expected on Tier-2: "lsm"

Enforcement tier is reported in audit events

Every enforcement event logged by the agent includes the enforcement_mode field. This allows compliance teams to verify that the declared tier matches the observed behaviour in audit logs.


eBPF Enforcement | Memory Protection | Root Attack Surface