GKE Deployment Guide¶
This guide covers deploying cloudtaser on GKE with the recommended configuration for maximum protection. With Ubuntu nodes and Confidential Computing enabled, you can achieve 100/115 protection score today, and 115/115 once cloudtaser-ebpf#175 re-routes the last remaining perf_event_open enforcement to a BPF LSM hook. See the protection-score reference for the full breakdown.
Why GKE Ubuntu + Confidential Nodes (today's recommendation)¶
GKE offers two node image types: Container-Optimized OS (COS) and Ubuntu. For the highest protection score available today, Ubuntu is the recommended choice — synchronous syscall blocking on the override-allowed subset (~15 of 16 syscalls).
| Feature | Ubuntu (linux-gke 6.8+) | COS (5.15 / 6.1 / 6.6) |
|---|---|---|
CONFIG_BPF_KPROBE_OVERRIDE |
Yes | No (upstream kernel default n; locked-down distro) |
CONFIG_BPF_LSM |
Yes | Yes — supported on COS, kernel-team-endorsed |
memfd_secret (kernel 5.14+) |
Yes | Yes |
| Synchronous syscall blocking — kprobe path | 15 of 16 syscalls today; kprobe_perf_event_open drops (see #175) |
None — bpf_override_return() not available |
| Synchronous syscall blocking — LSM path | Available post-#174 | Available post-#174 |
Wrapper hardening (dumpable=0, +5) |
Yes — synchronous baseline independent of kprobe | Yes — synchronous baseline independent of kprobe |
| Today's protection-score ceiling | 100/115 | 85/115 |
| Post-#175 ceiling | 115/115 | 85/115 (still gated on #174) |
| Post-#174 ceiling | 115/115 | 100/115 (parity via LSM hook) |
Ubuntu gives you kprobe override on the override-allowed subset today. The CONFIG_BPF_KPROBE_OVERRIDE=y kernel config enables bpf_override_return(), which lets the eBPF agent prevent a syscall from executing before it completes. The remaining gap is kprobe_perf_event_open: do_sys_perf_event_open is not in the upstream kernel's ALLOW_ERROR_INJECTION allow-list, so this single kprobe will never load on stock kernels (any distro). cloudtaser-ebpf#175 tracks the migration of perf_event_open enforcement to bpf_lsm_perf_event_open, which closes the last 15-point gap on Ubuntu.
On GKE COS today, the eBPF kprobe-override path is unavailable. COS ships without CONFIG_BPF_KPROBE_OVERRIDE (upstream kernel default n; the COS team treats error_injection as a debug-only feature and does not enable it on a hardened production distro). Today, the eBPF agent's syscall blocking on COS runs in detect+kill mode (tracepoint detection followed by SIGKILL); the wrapper's dumpable=0 (+5) provides the synchronous baseline that is independent of kprobe override. BPF LSM hooks ARE supported on COS (verified CONFIG_BPF_LSM=y on cos-5.15 / 6.1 / 6.6 lakitu_defconfig), and are the kernel-team-endorsed path for synchronous policy in production. cloudtaser-ebpf#174 tracks the strategic migration to LSM hooks, which will bring COS / Bottlerocket / Talos to parity with Ubuntu's synchronous-block posture.
Confidential nodes give you hardware memory encryption. GKE Confidential Nodes use AMD SEV-SNP to encrypt VM memory at the hardware level. The hypervisor and cloud provider cannot read the memory contents. This closes the last remaining attack surface after all software protections are in place.
Step 1: Create the GKE Cluster¶
Create a cluster with Ubuntu nodes and Confidential Computing:
gcloud container clusters create cloudtaser-prod \
--region europe-west4 \
--num-nodes 3 \
--image-type UBUNTU_CONTAINERD \
--enable-confidential-nodes \
--machine-type n2d-standard-2 \
--workload-pool "$(gcloud config get-value project).svc.id.goog" \
--release-channel regular
Key flags:
| Flag | Purpose |
|---|---|
--image-type UBUNTU_CONTAINERD |
Ubuntu nodes with kprobe override support |
--enable-confidential-nodes |
AMD SEV-SNP memory encryption on all nodes |
--machine-type n2d-standard-2 |
N2D instances required for Confidential Computing (AMD EPYC) |
--workload-pool |
Workload Identity for GCP service account binding |
--region europe-west4 |
EU region for data residency |
N2D machine type required
Confidential Computing on GKE requires N2D (AMD EPYC) instances. Other machine families (N2, E2, C3) do not support AMD SEV-SNP.
Step 2: Connect the Cluster to Your OpenBao¶
Use the cloudtaser CLI to configure Kubernetes auth on your EU-hosted OpenBao:
# Connect to the cluster
gcloud container clusters get-credentials cloudtaser-prod --region europe-west4
# Connect the cluster to your vault
cloudtaser-cli target connect \
--secretstore-address https://vault.eu.example.com \
--secretstore-token hvs.YOUR_ROOT_TOKEN \
--auth-path kubernetes/gke-prod
This configures OpenBao's Kubernetes auth method to accept ServiceAccount JWTs from the GKE cluster.
Step 3: Install cloudtaser¶
Install the operator and eBPF daemonset via Helm:
helm repo add cloudtaser https://charts.cloudtaser.io
helm install cloudtaser cloudtaser/cloudtaser \
--namespace cloudtaser-system \
--create-namespace \
--set operator.secretstore.address=https://vault.eu.example.com \
--set ebpf.enabled=true \
--set ebpf.enforceMode=true
Or use the CLI:
cloudtaser-cli target install \
--secretstore-address https://vault.eu.example.com \
--ebpf \
--enforce
Verify the installation:
Expected: operator and eBPF daemonset pods in Running state.
Step 4: Deploy a Protected Workload¶
Annotate your deployment with cloudtaser annotations:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
annotations:
cloudtaser.io/inject: "true"
cloudtaser.io/ebpf: "true"
cloudtaser.io/secretstore-address: "https://vault.eu.example.com"
cloudtaser.io/secretstore-role: "cloudtaser"
cloudtaser.io/secret-paths: "secret/data/myapp/config"
cloudtaser.io/env-map: "db_password=PGPASSWORD,api_key=API_KEY"
labels:
app: myapp
spec:
containers:
- name: myapp
image: myorg/myapp:v1.2.3
Required Annotations¶
| Annotation | Required | Description |
|---|---|---|
cloudtaser.io/inject |
Yes | Enables cloudtaser injection ("true") |
cloudtaser.io/ebpf |
No | Enables eBPF runtime enforcement ("true") |
cloudtaser.io/secretstore-address |
Yes | URL of the EU-hosted OpenBao |
cloudtaser.io/secretstore-role |
Yes | OpenBao Kubernetes auth role name |
cloudtaser.io/secret-paths |
Yes | Comma-separated OpenBao secret paths |
cloudtaser.io/env-map |
Yes | Maps OpenBao fields to environment variable names |
Apply the deployment:
Step 5: Verify the Protection Score¶
Check the wrapper logs to confirm the protection score:
With Ubuntu + Confidential Nodes + eBPF enforcement, you should see (live demo on GKE Ubuntu, kernel 6.8.0-1042-gke):
[cloudtaser-wrapper] Protection score: 100/115
[cloudtaser-wrapper] memfd_secret: OK (+15)
[cloudtaser-wrapper] mlock: OK (+10)
[cloudtaser-wrapper] core_dump_exclusion: OK (+5)
[cloudtaser-wrapper] dumpable_disabled: OK (+5)
[cloudtaser-wrapper] token_protected: OK (+10)
[cloudtaser-wrapper] environ_scrubbed: OK (+5)
[cloudtaser-wrapper] getenv_interposer: OK (+10)
[cloudtaser-wrapper] ebpf_agent_connected: OK (+10)
[cloudtaser-wrapper] cpu_mitigations: OK (+5)
[cloudtaser-wrapper] ebpf_enforce_mode: OK (+15)
[cloudtaser-wrapper] ebpf_kprobes: PARTIAL (15/16 attached -- perf_event_open dropped)
[cloudtaser-wrapper] confidential_vm: OK (+10)
The 15-point gap on ebpf_kprobes is the upstream-kernel ALLOW_ERROR_INJECTION allow-list constraint described above. 115/115 on Ubuntu post-cloudtaser-ebpf#175 (LSM hook re-route to bpf_lsm_perf_event_open); 100/115 on COS post-cloudtaser-ebpf#174 (BPF LSM migration).
No nodeSelector Needed¶
When all nodes in the cluster have the cloud.google.com/gke-confidential-nodes=true label (which they do when --enable-confidential-nodes is set at cluster creation), the operator auto-detects confidential node support. There is no need to add a nodeSelector to your workloads.
If you have a mixed cluster with both confidential and non-confidential node pools, the operator still detects the capability per-node and reports the confidential_vm check accordingly.
Container Image Requirements¶
The getenv_interposer check (10 points) is an optional glibc-only enhancement. The LD_PRELOAD interposer (libcloudtaser.so) blocks getenv() from returning secrets on the heap by returning pointers to memfd_secret-backed memory instead. It does not activate on musl or statically linked binaries -- those use the default env-var delivery path, which works for all binaries without code changes.
| Base Image | getenv_interposer | Recommendation |
|---|---|---|
| Debian / Ubuntu | Supported | Recommended |
| Red Hat / Fedora | Supported | Recommended |
| Alpine (musl) | Not supported | Use debian-slim instead |
| Distroless (glibc) | Supported | Works |
| Distroless (static) | Not supported | Use glibc variant |
| Scratch (static binary) | Not supported | Uses default env-var delivery |
Switch from Alpine to Debian slim
If your application uses alpine as the base image, consider switching to debian:bookworm-slim or ubuntu:24.04 for the same small footprint with glibc support. This enables the optional getenv interposer and adds 10 points to your protection score. Alpine and musl-based images still receive secrets through the default env-var delivery path.
Full Workflow Example¶
End-to-end deployment from scratch:
# 1. Create the cluster
gcloud container clusters create cloudtaser-prod \
--region europe-west4 \
--num-nodes 3 \
--image-type UBUNTU_CONTAINERD \
--enable-confidential-nodes \
--machine-type n2d-standard-2 \
--workload-pool "$(gcloud config get-value project).svc.id.goog"
# 2. Get credentials
gcloud container clusters get-credentials cloudtaser-prod --region europe-west4
# 3. Connect to vault
cloudtaser-cli target connect \
--secretstore-address https://vault.eu.example.com \
--secretstore-token hvs.YOUR_ROOT_TOKEN
# 4. Install cloudtaser
cloudtaser-cli target install \
--secretstore-address https://vault.eu.example.com \
--ebpf --enforce
# 5. Discover workloads and generate migration plan
cloudtaser-cli target discover -o plan.yaml
# 6. Apply plan to vault (provision policies and roles)
cloudtaser-cli source apply-plan plan.yaml \
--openbao-addr https://vault.eu.example.com \
--token hvs.YOUR_ROOT_TOKEN
# 7. Populate secrets in vault
bao kv put secret/myapp/config db_password=supersecret api_key=sk-live-xxx
# 8. Verify secrets exist
cloudtaser-cli source verify-plan plan.yaml \
--openbao-addr https://vault.eu.example.com \
--token hvs.YOUR_ROOT_TOKEN
# 9. Migrate workloads
cloudtaser-cli target protect --plan plan.yaml \
--secretstore-address https://vault.eu.example.com \
--interactive
# 10. Verify protection scores
cloudtaser-cli target status --namespace production
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
confidential_vm: FAIL |
Non-N2D machine type | Recreate node pool with --machine-type n2d-standard-2 --enable-confidential-nodes |
ebpf_kprobes: FAIL (all probes drop) |
COS node image — CONFIG_BPF_KPROBE_OVERRIDE=n |
Recreate node pool with --image-type UBUNTU_CONTAINERD. Reactive-kill on the kprobe path is in effect on COS until cloudtaser-ebpf#174 (BPF LSM migration) ships. |
ebpf_kprobes: PARTIAL (1 of 16 dropped) |
kprobe_perf_event_open dropped — do_sys_perf_event_open is not in the upstream ALLOW_ERROR_INJECTION allow-list |
Expected on every stock kernel today. Tracked under cloudtaser-ebpf#175; resolved by migrating to bpf_lsm_perf_event_open. |
getenv_interposer: FAIL |
Alpine or musl-based image | Switch to a debian/ubuntu-based container image |
ebpf_agent_connected: FAIL |
eBPF daemonset not running | Check kubectl get ds -n cloudtaser-system |
ebpf_enforce_mode: FAIL |
Enforce mode not enabled | Set ebpf.enforceMode=true in Helm values |
What we recommend you also run¶
cloudtaser-ebpf does not occupy the entire BPF LSM stack. BPF LSM-based tools compose cleanly with cloudtaser — they hook different LSM call sites (network egress, file ACLs, capability drops, container lifecycle) and do not conflict with cloudtaser's syscall-blocking programs. Pairing cloudtaser with one of the following layers gives you defense-in-depth without instrumentation overlap:
- Tetragon — Cilium's runtime security observability and enforcement. Synchronous policy via BPF LSM hooks, fully supported on COS / Bottlerocket / Talos. Hooks process exec, file access, network connect, capability use. Composes cleanly with cloudtaser-ebpf (different hook points; no
bpf_override_returncollision). - KubeArmor — runtime policy via BPF LSM and AppArmor / SELinux fallback. Strong on file-path policy and process whitelisting per container.
A forthcoming comparison page on cloudtaser.io will document the recommended pairings and the threat-model overlap explicitly — see cloudtaser-io-website#277.
Why this matters more on COS / Bottlerocket
Until cloudtaser-ebpf#174 ships and brings COS / Bottlerocket / Talos to synchronous-block parity, pairing cloudtaser with Tetragon or KubeArmor on those distros gives you BPF LSM-based synchronous enforcement now. On Ubuntu, the pairing is still valuable (cloudtaser blocks at the syscall level; Tetragon/KubeArmor add policy at the LSM level) but less load-bearing.
NetworkPolicy and NodeLocal DNSCache¶
If you use Kubernetes NetworkPolicy to restrict egress from protected namespaces, wrapper pods may fail with context deadline exceeded during the broker secret fetch. The root cause is GKE's NodeLocal DNSCache (node-local-dns), which runs as a hostNetwork: true DaemonSet listening on 169.254.20.10. Because it uses the host network, it does not match a namespaceSelector targeting kube-system — even though the node-local-dns pods live in the kube-system namespace.
The problem¶
A NetworkPolicy like this looks correct but silently blocks DNS when NodeLocal DNSCache is active:
# BROKEN: does not match hostNetwork pods
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Pods resolve DNS via the node-local cache at 169.254.20.10, not via the kube-dns ClusterIP. The namespaceSelector rule only matches traffic to pod IPs in kube-system, which does not include host-network addresses.
The fix¶
Add an explicit egress rule for the NodeLocal DNSCache IP in addition to the kube-system selector (the kube-system rule is still needed as a fallback when NodeLocal DNSCache is not present or for TCP DNS to upstream):
egress:
# NodeLocal DNSCache (hostNetwork, 169.254.20.10)
- to:
- ipBlock:
cidr: 169.254.20.10/32
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# kube-dns fallback (pod network)
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Self-check¶
Verify DNS works from within a restricted namespace:
kubectl run dns-test --rm -it --restart=Never \
--namespace <your-namespace> \
--image=busybox:1.36 -- nslookup kubernetes.default
If the command hangs or returns server can't find kubernetes.default: SERVFAIL, the NetworkPolicy is still blocking DNS.
Applies to any workload behind NetworkPolicy
This is not cloudtaser-specific — any pod behind a restrictive egress NetworkPolicy on GKE with NodeLocal DNSCache enabled will hit the same issue. CloudTaser wrapper pods surface it early because the broker fetch is the first network call after pod start.
Reference: cloudtaser-demo#261
See also¶
- Platform Compatibility — full per-distro matrix (GKE / EKS / AKS / k3s / Talos)
- Protection Score Reference — all 12 checks explained
- Reverse-Connect Architecture — deploying without exposing your OpenBao
- Kubernetes Compatibility — full distribution matrix
- Enterprise Deployment — multi-cluster topology
References¶
- BPF LSM kernel documentation
- COS lakitu_defconfig (cos-6.6) —
CONFIG_BPF_LSM=yconfirmed - Linux
ALLOW_ERROR_INJECTIONallow-list (include/asm-generic/error-injection.h) - Tetragon issue #1392 —
bpf_override_returnnot available on COS - cloudtaser-ebpf#174 — strategic migration to BPF LSM hooks
- cloudtaser-ebpf#175 — re-route
perf_event_openenforcement tobpf_lsm_perf_event_open