Tier-2 Kernel Enforcement¶
CloudTaser classifies kernel environments into two enforcement tiers based on eBPF capability. This page documents the Tier-2 threat model, the data-leak window bounds for secondary vectors, active mitigations, and customer deployment guidance.
Research basis
This threat model was researched and validated against live kernels as part of cloudtaser-ebpf#606 (2026-06-04). The worst-case window figures below are derived from measured kernel ring-buffer latency on GKE COS nodes.
Kernel Tier Definitions¶
Tier-1 — CONFIG_BPF_KPROBE_OVERRIDE=y available. All attack vectors are denied synchronously before the syscall executes. Zero data-leak window.
Tier-2 — BPF LSM available, CONFIG_BPF_KPROBE_OVERRIDE absent. This is the default for most managed Kubernetes environments: GKE COS, EKS Bottlerocket, standard Debian/Ubuntu kernels.
| Tier | Kernel capability | Examples | Primary enforcement |
|---|---|---|---|
| Tier-1 | CONFIG_BPF_KPROBE_OVERRIDE=y |
Custom kernels, Azure Linux 3.0+, EKS AL2023 | Synchronous deny — all vectors |
| Tier-2 | BPF LSM, no kprobe override | GKE COS, EKS Bottlerocket, Debian 12, Ubuntu 22.04 | Synchronous LSM deny (primary) + ReactiveKill (secondary) |
Check your kernel tier
Tier-2 Enforcement Model¶
On Tier-2 kernels, CloudTaser operates at two enforcement levels.
Synchronous LSM Denial (primary enforcement)¶
BPF LSM hooks intercept system calls before they execute. No data is read; the attacker receives an error immediately.
| Hook | Vectors blocked | Return value |
|---|---|---|
lsm_ptrace_access_check |
process_vm_readv, process_vm_writev, ptrace(PTRACE_ATTACH), ptrace(PTRACE_SEIZE) |
-EPERM |
lsm_file_open |
Opens of /proc/<pid>/environ, /proc/<pid>/mem, /proc/<pid>/maps, /proc/<pid>/pagemap, /proc/<pid>/smaps, /proc/<pid>/stack, /proc/<pid>/syscall, /dev/mem, /dev/kmem, /proc/kcore |
-EPERM |
Attacker calls process_vm_readv(target_pid, ...)
→ lsm_ptrace_access_check BPF hook fires BEFORE syscall executes
→ Hook checks caller against protected cgroup-array
→ Returns -EPERM
→ No memory is read
→ Event logged: VMREADV_DENIED
Primary vectors are zero-data-leak denials
process_vm_readv and /proc/<pid>/mem are the principal memory-read attack vectors. Both are denied synchronously by BPF LSM hooks on Tier-2. The attacker receives an error; no data is transferred.
Asynchronous ReactiveKill (secondary enforcement)¶
Eight syscall families have no BPF LSM hook in the supported kernel matrix. On Tier-2, the agent attaches via tracepoints, detects the violation at syscall-exit, and sends SIGKILL from userspace. The syscall completes before the signal arrives.
| Syscall family | Attack scenario | Mechanism |
|---|---|---|
io_uring_* |
Bypasses per-syscall eBPF hooks via submission queue | Tracepoint + SIGKILL |
userfaultfd |
Page-fault interception for controlled memory read | Tracepoint + SIGKILL |
copy_file_range |
In-kernel zero-copy between files | Tracepoint + SIGKILL |
kcmp |
Process comparison to fingerprint targets | Tracepoint + SIGKILL |
process_madvise |
Advise changes on another process's memory | Tracepoint + SIGKILL |
init_module / finit_module |
Kernel module load for privilege escalation | Tracepoint + SIGKILL |
setns |
Namespace escape | Tracepoint + SIGKILL |
splice / tee / sendfile / vmsplice |
Zero-copy data exfiltration | Tracepoint + SIGKILL |
ReactiveKill has a data-leak window
The syscall completes before SIGKILL is delivered. The window is bounded by ring-buffer flush latency + Go goroutine scheduling + signal delivery — measured at approximately 5ms on GKE COS nodes.
Worst-Case Data-Leak Window Analysis¶
Theoretical maximum¶
At sendfile(2) throughput (~10 GB/s kernel-to-kernel), a theoretical maximum of ~50 MB could be transferred in one syscall before the SIGKILL arrives.
Practical exposure¶
The theoretical maximum does not reflect realistic attack conditions:
-
memfd_secretmakes primary vectors unreachable. Secrets in the wrapper are stored inmemfd_secret-backed memory (kernel 5.14+). This memory is removed from the kernel direct map —process_vm_readvreturnsEIOagainst these pages regardless of LSM enforcement. The LSM hook is defense-in-depth on top of hardware-level isolation. -
ReactiveKill vectors are secondary side-channels. The kprobe-only vector families (
splice,sendfile,io_uring, etc.) are not direct memory-read paths. They require the attacker to already have data in a buffer they control — exfiltrating wrapper heap requires the primary vectors first, which LSM blocks. -
Heap zeroing closes the transient plaintext window. Wrapper v0.2+ zeros Go heap copies of secrets immediately after use. A successful read of wrapper heap during the fetch window returns zeroed or partial plaintext.
-
Seccomp-bpf blocks vector creation at pod admission. The operator injects a RuntimeDefault seccomp profile that blocks
process_vm_readv,pidfd_getfd, anduserfaultfdat the kernel syscall filter level for attacker pods.
Window summary¶
| Scenario | Data-leak window | Notes |
|---|---|---|
Primary vectors (process_vm_readv, /proc/mem) |
None — synchronous LSM denial | BPF LSM hook fires before syscall |
Secondary vectors (splice, sendfile, io_uring) |
~5ms ReactiveKill window | memfd_secret pages still inaccessible inside window |
| Theoretical worst case (no mitigations) | ~50 MB per call | Not achievable: memfd + seccomp + heap-zeroing |
Active Mitigations on Tier-2¶
| Mitigation | Status | Effect |
|---|---|---|
memfd_secret (kernel 5.14+) |
Active by default | Hardware-level page hiding; process_vm_readv returns EIO against secret pages |
| Heap zeroing | Active (wrapper v0.2+) | Transient Go heap copies zeroed after use; reduces plaintext exposure during fetch window |
| Seccomp-bpf (RuntimeDefault) | Active via operator injection | Blocks process_vm_readv, pidfd_getfd, userfaultfd on attacker pods at admission |
| BPF LSM (primary enforcement) | Active on Tier-2 | Synchronous denial of all primary memory-read vectors |
| ReactiveKill (secondary enforcement) | Active on Tier-2 | ~5ms kill window for secondary vectors |
| Kernel upgrade to Tier-1 | Optional | Restores synchronous denial for all vectors including secondary families |
Customer Guidance¶
Default (GKE, EKS, AKS on standard kernels)¶
Tier-2 enforcement provides synchronous denial for all primary secret-access vectors. The ~5ms ReactiveKill window applies only to secondary side-channel vectors that are not direct memory-read paths. Combined with memfd_secret and heap zeroing, this provides strong practical protection for regulated workloads.
Zero-tolerance posture¶
For workloads requiring synchronous denial on every vector without exception:
- Deploy on Tier-1 kernels (custom kernel build with
CONFIG_BPF_KPROBE_OVERRIDE=y, or Azure Linux 3.0+ / EKS AL2023) - Verify the enforcement tier:
cloudtaser-cli status --enforcement-tier - Confirm the agent reports
enforcement_mode: fullon its/statusendpoint
Verify your enforcement tier¶
# Check agent enforcement mode
kubectl exec -n cloudtaser ds/cloudtaser-ebpf -- \
wget -qO- http://localhost:8080/status | jq .enforcement_mode
# Expected on Tier-1: "full"
# Expected on Tier-2: "lsm"
Enforcement tier is reported in audit events
Every enforcement event logged by the agent includes the enforcement_mode field. This allows compliance teams to verify that the declared tier matches the observed behaviour in audit logs.