Skip to content

Admission Webhook Hardening

This page documents the security controls applied to the cloudtaser-operator admission webhook as of Epic #11 (operator-webhook-hardening, Gold tier). These controls are relevant to operators and SREs who are deploying cloudtaser into regulated environments and need to understand the webhook's attack surface, privilege scope, and failure modes.


RBAC tightening

ClusterRole scope reduction

The operator's ClusterRole was narrowed to the minimum permissions required for webhook operation.

Removed privileges:

  • list and watch on all cluster Secrets — the operator does not enumerate Secrets cluster-wide; this permission was a legacy overgrant. Removing it closes a lateral read path for any process running under the operator's service account.
  • The wildcard pods/exec verb grant was removed. Only the specific verbs required for exec-validator functionality are present. Any process that gained code execution under the operator service account could no longer escalate to arbitrary pod exec across the cluster.

Corrected privileges:

  • configmaps: explicit get, list, watch, create, update, patch, delete verbs replace an implicit wildcard.
  • validatingwebhookconfigurations: corrected from update to patch — the operator patches the caBundle field in the webhook config; it does not replace the entire object.

Namespace exclusions removed

The validating webhook previously excluded kube-system and cloudtaser-system unconditionally. This exclusion allowed misconfigured CloudTaser CRs in those namespaces to proceed without admission validation. The exclusion is now removed: malformed CR specs in any namespace are rejected with a clear error message at admission time rather than being silently accepted and producing a runtime failure.


Error redaction guarantees

Vault response body redaction

The operator now redacts vault response bodies before including them in returned admission errors. Previously, a vault error response containing a token, policy dump, or secret path enumeration could surface in the kubectl apply output of the user who triggered injection. Admission errors now include only the HTTP status code and a structured message — never the raw vault response body.

/v1/bridge/init unseal output

The operator's bridge init handler has an "already initialized" branch that was previously logging and echoing the raw unseal response from the bridge. That branch now returns a structured "already initialized" status without embedding any unseal material.

Annotation key warnings instead of silent drops

When a pod carries an unrecognized cloudtaser.io/* annotation (for example, a typo in cloudtaser.io/vault-addres), the mutating webhook now emits an admission Warning visible in kubectl apply output rather than silently ignoring the annotation. The pod is still admitted — warnings are non-fatal — but the operator no longer silently drops a mis-spelled annotation that the user intended to act on.


Cosign verifier hardening

The image signature verifier used by the operator's admission webhook received several hardening changes relevant to supply-chain security posture.

Dependency-injectable verifier

The verifier is now dependency-injectable for testing. The previous implementation re-marshalled the cosign payload through a partial Go struct before hashing, which silently discarded optional fields and produced a hash that did not match the original payload bytes. The re-marshal path is removed; the verifier now operates on raw payload bytes, matching what cosign actually signed.

Private-registry keychain authentication

The verifier now threads through the configured image pull keychain when calling the OCI registry for signature lookup. Previously the verifier used anonymous credentials regardless of whether the image itself was pulled from a private registry — this caused signature verification to fail with a 401 for any image not hosted on a public registry.

Per-call timeouts on registry operations

remote.Get and remote.Image calls in the verifier now carry a per-call context timeout. A hanging or slow registry can no longer cause the admission webhook to hold an admission request open indefinitely, which in a high-pod-churn scenario can exhaust webhook timeout budgets and degrade cluster operations.

Fail-closed on unreachable registry

When the registry is unreachable during signature verification, the webhook now denies admission rather than allowing the pod through unverified. This ensures the image verification policy is enforced even during transient registry connectivity issues — a network partition cannot be used to bypass signature checks.

Explicit Command in pod spec required

The webhook now enforces that pods subject to image verification carry an explicit Command in their container spec. A pod without an explicit command relies on the image's default entrypoint, which the webhook cannot verify against a cosign signature without performing a registry round-trip. Making this explicit ensures the admission decision is deterministic.


Cert rotation and zero-K8s-Secrets invariant

Legacy "kubernetes" backend removed

The operator previously included a certificate backend named "kubernetes" that wrote the webhook's TLS private key into a Kubernetes Secret object. This violated the cloudtaser product invariant that no sensitive material touches etcd or K8s Secrets.

The "kubernetes" backend is removed. The operator's webhook TLS certificate rotation path now operates entirely in process memory: certificates are generated and rotated in-memory, injected via the /v1/bridge/init endpoint, and never persisted to a K8s Secret, ConfigMap, or any etcd-backed object.

This change is transparent to users who deploy via the Helm chart — the chart defaults were updated in the same wave. Operators who configured webhookCertBackend: kubernetes explicitly in custom values files should remove that key; the backend no longer exists and the field is ignored with a startup warning.


Protection score honesty

The protection score reported by the operator's attestation system was adjusted in this wave to accurately reflect what is active versus what is merely detected.

No capability points for detect-only eBPF

Score checks for eBPF capabilities (ebpf_enforce_mode, ebpf_kprobes) no longer award points when the eBPF agent is present but running in detect-only mode. Previously, a daemonset that reported healthy but was configured for monitoring-only would still award full enforcement points, overstating the protection level.

kprobes_active distinguished from DS-Running

The kprobes_active check now reads the actual kprobe attachment state from the eBPF agent's status endpoint rather than inferring it from daemonset pod readiness. A running pod that failed to attach its kprobe programs (due to kernel version mismatch or missing CAP_BPF) no longer reports kprobes_active: true.

memfd_fd_cloexec check

A new score check, memfd_fd_cloexec, verifies that the file descriptor created by memfd_secret(2) has the O_CLOEXEC flag set. A memfd without O_CLOEXEC is inherited by child processes across exec, leaking the secret fd to every subprocess the wrapper spawns. This check awards points only when the flag is confirmed present.

CVE-2023-4911 mitigation scored

The ld_env_stripped check now awards points for the presence of the CVE-2023-4911 (Looney Tunables) mitigation. The wrapper strips LD_PRELOAD and LD_LIBRARY_PATH from the environment before exec-ing the application; the score now reflects this as an explicit defense against glibc GLIBC_TUNABLES privilege-escalation vectors.


Crash hardening

Nil VaultCertStore startup panic

A nil-pointer dereference was fixed in the operator startup path when VaultCertStore is not configured. Previously, starting the operator without a vault cert store configured would panic at the first webhook invocation rather than returning a clean error. The operator now guards this path and returns a structured admission error.

Missing CRD no longer causes cluster-wide pod-creation DoS

When the CloudTaserConfig CRD is absent from the cluster (for example, during a partial install or a CRD migration), the mutating webhook previously returned a 500 Internal Server Error for every pod creation across every namespace. A 500 from a mutating webhook causes the pod admission to fail (fail-open is the default for 500 but some cluster configurations treat it as fail-closed). The webhook now detects the missing CRD condition at startup and fails cleanly rather than producing a cluster-wide pod-creation disruption.


Summary table

Area Change Issue
RBAC Remove cluster-wide Secret list/watch #240
RBAC Explicit configmaps verbs #340
RBAC Remove pods/exec wildcard #362
RBAC Correct Update→patch on ValidatingWebhookConfiguration #316
RBAC Remove kube-system/cloudtaser-system exclusion from validating webhook #319
Error redaction Vault response body not included in admission errors #345
Error redaction bridge/init already-initialized branch no longer echoes unseal output #361
Error redaction Unrecognized annotation emits Warning instead of silent drop #359
Cosign Dependency-injectable verifier, raw-bytes hash #308
Cosign Private-registry keychain auth #309
Cosign Per-call timeout on remote.Get/remote.Image #365, #344
Cosign Fail-closed on unreachable registry #320
Cert rotation Remove kubernetes backend; cert path respects zero-K8s-Secrets invariant #366
Score honesty No points for detect-only eBPF #325
Score honesty kprobes_active from attachment state, not DS readiness #323
Score honesty memfd_fd_cloexec score check #322
Score honesty CVE-2023-4911 ld_env_stripped mitigation scored #321
Crash hardening Nil VaultCertStore startup panic fixed #313
Crash hardening Missing CRD no longer causes cluster-wide pod DoS #315