Skip to content

Troubleshooting

Common issues and their solutions when running cloudtaser. This page covers diagnostics and fixes for the operator, wrapper, eBPF agent, and CLI.


Pods Stuck in Init

Symptom: Pod stays in Init:0/1 state. The init container that copies the wrapper binary has not completed.

Diagnosis:

kubectl describe pod <pod-name>
kubectl logs <pod-name> -c cloudtaser-init

Common causes:

1. Wrapper image not pullable

The init container pulls from ghcr.io/cloudtaser/cloudtaser-wrapper. Verify that image pull secrets are configured:

kubectl get pod <pod-name> -o jsonpath='{.spec.imagePullSecrets}'

If using a private registry, ensure imagePullSecrets is set in the Helm values.

2. EmptyDir volume mount failure

The wrapper is copied to a memory-backed emptyDir at /cloudtaser/. Check that the node has sufficient memory for the volume (the wrapper binary is approximately 10 MB).

3. Operator not running

If the operator is down, the mutating webhook may fail and block pod creation (default failurePolicy: Fail):

kubectl get pods -n cloudtaser-system

failurePolicy: Fail blocks pod creation

If the operator is not running, pods will fail to schedule. Restart the operator or temporarily set failurePolicy: Ignore in the MutatingWebhookConfiguration. Setting it to Ignore means pods will start without injection -- use only as an emergency measure.

Operator pod Ready does not imply beacon connected (operator v0.9.4+)

The operator's startup probe is decoupled from beacon connectivity. The pod becomes Ready in seconds regardless of whether the bridge has connected to the beacon yet. If the bridge has not connected, the operator pod will be Ready but the webhook will return a clear error on pod creation attempts. This is expected during initial bootstrap — wait for the bridge to connect, then retry pod creation.


Wrapper Cannot Connect to OpenBao

Symptom: Pod starts but the application does not receive secrets. Container logs show OpenBao connection errors.

Diagnosis:

kubectl logs <pod-name> -c <container-name> | grep -i vault

Common causes:

1. OpenBao endpoint unreachable

Verify network connectivity from the pod to OpenBao:

kubectl exec <pod-name> -- wget -q --spider https://vault.eu.example.com/v1/sys/health

Check firewall rules, VPC peering, security groups, and NetworkPolicies.

2. Kubernetes auth not configured

The wrapper authenticates to OpenBao using the pod's ServiceAccount token. Verify the auth method is configured:

cloudtaser-cli target validate \
  --secretstore-address https://vault.eu.example.com \
  --secretstore-token hvs.YOUR_TOKEN

If auth is not configured, run cloudtaser-cli target connect to set it up.

3. Wrong OpenBao role

Verify the cloudtaser.io/secretstore-role annotation matches a role configured in OpenBao:

vault read auth/kubernetes/role/cloudtaser

4. ServiceAccount not bound

The OpenBao role must allow the pod's ServiceAccount. Check the bound_service_account_names and bound_service_account_namespaces in the OpenBao role.

5. TLS certificate error

If OpenBao uses a private CA, the wrapper cannot verify the certificate. Mount the CA bundle into the pod.

Quick connectivity test

Run cloudtaser-cli target validate --secretstore-address https://vault.eu.example.com to check OpenBao health, seal status, and Kubernetes auth configuration in one step.


eBPF Agent Not Starting

Symptom: cloudtaser-ebpf pods are in CrashLoopBackOff or not starting.

Diagnosis:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
kubectl describe daemonset -n cloudtaser-system cloudtaser-ebpf

Common causes:

1. Kernel too old

The eBPF agent requires Linux kernel 5.8+ with BTF (BPF Type Format) support. Check the node kernel:

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'

Kernel version recommendations

For full protection (including memfd_secret), kernel 5.14+ is recommended. See Kernel Compatibility for the complete support matrix.

2. BTF not available

The agent needs /sys/kernel/btf/vmlinux. GKE COS and Ubuntu nodes have this by default. Some custom AMIs may not:

kubectl exec -n cloudtaser-system <ebpf-pod> -- ls /sys/kernel/btf/vmlinux

3. Insufficient capabilities

The eBPF agent requires privileged mode with SYS_ADMIN, SYS_PTRACE, NET_ADMIN, and SYS_RESOURCE capabilities. Verify that PodSecurityPolicy or PodSecurityStandard is not blocking these.

4. GKE Autopilot

Autopilot clusters do not allow privileged pods or hostPID. Use GKE Standard instead.

5. Fargate (EKS)

Fargate does not support DaemonSets or host-level access. Use managed or self-managed node groups.

Unsupported environments

GKE Autopilot and AWS Fargate are not compatible with the eBPF agent. The wrapper still provides secret injection, but runtime enforcement (blocking /proc reads, ptrace, etc.) is not available.


Secrets Not Injected

Symptom: Application starts but environment variables with secrets are missing.

Diagnosis:

# Check if injection annotation is present
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations}' | grep cloudtaser

# Check if wrapper is running as PID 1
kubectl exec <pod-name> -- cat /proc/1/cmdline | tr '\0' ' '

Common causes:

1. Missing inject annotation

The pod template must have cloudtaser.io/inject: "true". The annotation must be on the pod template, not on the Deployment itself:

spec:
  template:
    metadata:
      annotations:
        cloudtaser.io/inject: "true"  # HERE, not on the Deployment

2. Wrong env-map syntax

The cloudtaser.io/env-map annotation maps OpenBao fields to environment variables. Format: vault_field=ENV_VAR,field2=ENV_VAR2:

cloudtaser.io/env-map: "db_password=PGPASSWORD,api_key=API_KEY"

Common mistake: reversed order

The correct format is vault_field=ENV_VAR, not ENV_VAR=vault_field. A reversed mapping will silently fail to inject the expected secrets.

3. Wrong OpenBao path

Ensure cloudtaser.io/secret-paths uses the KV v2 data path:

# Correct:
cloudtaser.io/secret-paths: "secret/data/myapp/config"

# Wrong (missing data/ prefix for KV v2):
cloudtaser.io/secret-paths: "secret/myapp/config"

4. Namespace not in webhook scope

The operator only injects pods in namespaces that are not system namespaces (kube-system, kube-public, kube-node-lease). Verify your namespace is not excluded.

5. Webhook not intercepting

Check the MutatingWebhookConfiguration:

kubectl get mutatingwebhookconfiguration cloudtaser-webhook -o yaml

High Latency on Pod Startup

Symptom: Pods take significantly longer to start after cloudtaser injection.

Common causes:

1. OpenBao fetch time

The wrapper fetches secrets before starting the application. If OpenBao is slow (geographically distant, under load), this adds startup latency.

Reduce OpenBao latency

  • Deploy OpenBao in the same region as the cluster (still within the EU)
  • Reduce the number of secret paths per pod (fewer OpenBao API calls)
  • Ensure OpenBao is not sealed or in standby mode

2. Image entrypoint resolution

The operator resolves the container image entrypoint by querying the container registry. This adds latency on the first injection. The result is cached per image.

3. Init container image pull

The first pod on a node pulls the wrapper image. Subsequent pods use the cached image. Use imagePullPolicy: IfNotPresent (default) to avoid re-pulling.


Protection Score Low

Symptom: cloudtaser-cli target status or cloudtaser-cli target audit reports a low protection score.

The protection score (max 65) reflects which defenses are active:

Check Points Fix
memfd_secret 15 Use kernel 5.14+ on nodes
mlock 10 Ensure CAP_IPC_LOCK is available (or ulimit -l unlimited)
MADV_DONTDUMP 5 Automatic (requires wrapper v0.0.14+)
PR_SET_DUMPABLE(0) 5 Automatic (requires wrapper v0.0.14+)
Token protected 10 Automatic when memfd_secret or mlock is available
eBPF connected 10 Ensure eBPF daemonset is running on the node
Kprobes active 10 Requires kernel CONFIG_BPF_KPROBE_OVERRIDE

To improve the score:

  1. Upgrade node kernel to 5.14+ for memfd_secret support (15 points). This is the single most impactful change.
  2. Ensure eBPF agent is running on every node (10 points):

    kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf
    
  3. Check kernel kprobe override support (10 points). Note: CONFIG_BPF_KPROBE_OVERRIDE is not enabled on any major cloud provider kernel (GKE, EKS, AKS). See Kernel Compatibility for details.


eBPF Enforcement Issues

Symptom: eBPF agent is running but enforcement events are not generated, or legitimate operations are being blocked.

Diagnosis:

# Check agent logs
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf

# Check enforce mode
kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf \
  -o jsonpath='{.spec.template.spec.containers[0].env}' | grep ENFORCE

Common causes:

1. Enforce mode disabled

If ENFORCE_MODE=false, the agent only logs events without blocking. Enable via Helm:

ebpf:
  enforceMode: true

2. Reactive kill fallback

On kernels without CONFIG_BPF_KPROBE_OVERRIDE (all major cloud providers), the agent uses reactive kill (SIGKILL after detection) instead of synchronous blocking. This is the expected behavior on GKE, EKS, and AKS.

Reactive kill is still effective

The race window between detection and SIGKILL is microseconds. An attacker reading /proc/pid/environ gets killed before they can exfiltrate the data over the network, because the network send is also monitored and blocked.

3. Application uses io_uring

cloudtaser blocks io_uring_setup() for protected processes because io_uring bypasses buffer-level monitoring. Applications requiring io_uring must use standard syscalls instead.


Ring Buffer Errors

Symptom: Agent logs show ring buffer read errors, or the health endpoint reports unhealthy.

Diagnosis:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf | grep "ring buffer"

Common causes:

1. Consecutive read failures

The agent tracks consecutive ring buffer read errors. After 10 consecutive failures, it marks itself unhealthy and stops the event reader:

{"level":"ERROR","msg":"ring buffer reader failed repeatedly, stopping","consecutive_errors":10}

This typically indicates the BPF ring buffer file descriptor has become invalid. Restart the agent pod:

kubectl rollout restart daemonset -n cloudtaser-system cloudtaser-ebpf

2. Event channel full

If the agent logs "event channel full, dropping event", the user-space event processing pipeline cannot keep up with kernel event volume. This happens under extreme syscall load from monitored processes.

Mitigation:

  • Reduce the number of monitored PIDs
  • Disable LOG_ALL if enabled (it generates events for every syscall, not just violations)
  • Increase agent CPU limits to allow faster event processing

3. Ring buffer map not found

If the agent fails with "events ring buffer map not found", the BPF object file may be corrupted or incompatible with the running kernel. Verify the eBPF object path:

kubectl exec -n cloudtaser-system <ebpf-pod> -- ls -la /opt/cloudtaser/secret_monitor.o

Secret Region Protection Not Working

Symptom: The eBPF agent is running but does not detect access to protected process memory.

Diagnosis:

# Check if the agent has registered any protected pod cgroups
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf | grep "registered"

# Check if the wrapper is connecting to the agent
kubectl logs <pod-name> -c <container-name> | grep "ebpf"

Common causes:

1. Wrapper not registering with agent

The wrapper connects to the eBPF agent via gRPC (default: 0.0.0.0:9443) to register its PID and secret memory regions. If the socket path is not shared between the wrapper and agent, registration fails silently.

Verify the host path volume is mounted in both the eBPF daemonset and the protected pod:

kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf \
  -o jsonpath='{.spec.template.spec.volumes}' | python3 -m json.tool

2. BPF maps not populated

The agent maintains BPF maps for enforcement. In v0.1.59+ (helm 1.0.61+) the pod-scope map is a BPF_MAP_TYPE_CGROUP_ARRAY; earlier versions used an inode-based monitored_cgroups HASH. The maps relevant for diagnostics:

Map Purpose
protected_cgroups Cgroup array for pod-scope enforcement (v0.1.59+, replaces monitored_cgroups HASH)
secret_regions Memory address ranges containing secrets per PID
secret_content 16-byte content prefixes for content-based leak detection

If protected_cgroups is empty (or monitored_pids on pre-v0.1.59 agents), no enforcement occurs. Check agent logs for cgroup registration events.

3. Content-based matching disabled

If the agent logs "secret_content map not found, content-based matching disabled" at startup, the BPF object was compiled without the secret_content map. Content-based leak detection (matching secret data in write()/sendto() buffers) will not work, but cgroup-scope region protection still functions.

4. Global privilege escalation detection

If GLOBAL_PRIVESC_DETECT=true, the agent monitors kernel module and eBPF program loading from all PIDs on the node, not just protected pod cgroups. If this is disabled and you expect to see module_load or bpf_load events from non-protected processes, enable it:

env:
  - name: GLOBAL_PRIVESC_DETECT
    value: "true"

Webhook Failures

Symptom: Pod creation fails with errors referencing the cloudtaser webhook, or pods are created without injection despite having the correct annotations.

Diagnosis:

# Check webhook configuration exists
kubectl get mutatingwebhookconfiguration cloudtaser-operator-webhook

# Check webhook endpoint is reachable
kubectl get endpoints -n cloudtaser-system cloudtaser-operator-webhook

# Check operator logs for webhook errors
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "webhook\|error\|admission"

Common causes:

1. Certificate expired or CA bundle mismatch

The operator generates self-signed TLS certificates for the webhook. Where they are stored depends on the secretBackend setting:

  • vault (default): Certificates are stored in OpenBao at secret/cloudtaser/system/webhook-tls and served from memory. See Zero Kubernetes Secrets Architecture.
  • kubernetes (fallback): Certificates are stored in a Kubernetes Secret (cloudtaser-operator-certs).

The CA bundle is patched into the MutatingWebhookConfiguration at startup and rotated 30 days before expiry (certificates are valid for 1 year).

If the CA bundle in the webhook configuration does not match the certificate the operator is serving, the API server cannot verify the TLS connection and rejects webhook calls.

Diagnosis (OpenBao backend):

# Check operator logs for certificate errors
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "cert\|tls\|webhook"

Diagnosis (kubernetes backend):

# Check the cert secret exists
kubectl get secret -n cloudtaser-system cloudtaser-operator-certs

# Check cert expiry
kubectl get secret -n cloudtaser-system cloudtaser-operator-certs \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -enddate

Fix: Restart the operator. It checks certificate expiry on startup and regenerates certificates if they expire within 30 days. It also patches the webhook configuration with the current CA bundle on every startup.

kubectl rollout restart deployment -n cloudtaser-system cloudtaser-operator

If using the kubernetes backend and the certificate Secret is corrupted, delete it and restart the operator. It will generate new certificates:

kubectl delete secret -n cloudtaser-system cloudtaser-operator-certs
kubectl rollout restart deployment -n cloudtaser-system cloudtaser-operator

2. Namespace not in webhook scope

The MutatingWebhookConfiguration uses a namespaceSelector that excludes cloudtaser-system and kube-system:

namespaceSelector:
  matchExpressions:
    - key: kubernetes.io/metadata.name
      operator: NotIn
      values:
        - cloudtaser-system
        - kube-system

Pods in these namespaces are never intercepted by the webhook. Pods in kube-public and kube-node-lease are also excluded by the operator's internal logic.

Fix: Ensure your workload is in a non-system namespace.

3. Webhook timeout

The webhook has a default timeout of 10 seconds (operator.webhook.timeoutSeconds in Helm values). If the operator is under heavy load or the container registry is slow (the operator resolves image entrypoints by querying the registry), the webhook may time out.

Diagnosis:

# Check for timeout events
kubectl get events --field-selector reason=FailedCreate -A | grep cloudtaser

Fix: Increase the webhook timeout in Helm values:

operator:
  webhook:
    timeoutSeconds: 30

Or investigate why the operator is slow (check CPU/memory limits, registry latency).

4. Annotation format errors

The operator silently skips injection if annotations are malformed. Common mistakes:

Mistake Correct
cloudtaser.io/inject: true (no quotes) cloudtaser.io/inject: "true"
Annotations on Deployment metadata Annotations on spec.template.metadata
cloudtaser.io/env-map: "ENV_VAR=vault_field" cloudtaser.io/env-map: "vault_field=ENV_VAR"
Missing OpenBao address Set cloudtaser.io/secretstore-address or use a CloudTaserConfig CR
Missing secret paths Set cloudtaser.io/secret-paths

Diagnosis:

# Check if the pod has the injection status annotation
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.cloudtaser\.io/status}'

If the annotation is missing, the webhook did not process the pod. Check if the inject annotation is on the pod template (not the Deployment itself).

5. failurePolicy: Fail blocking all pods

The default failurePolicy is Fail, meaning if the webhook is unreachable (operator down, network issue), the API server rejects all pod creation in the webhook's scope.

Emergency fix:

# Switch to Ignore (pods start without injection)
kubectl patch mutatingwebhookconfiguration cloudtaser-operator-webhook \
  --type='json' \
  -p='[{"op":"replace","path":"/webhooks/0/failurePolicy","value":"Ignore"}]'

Ignore means no injection

With failurePolicy: Ignore, pods start without cloudtaser wrapper injection. Secrets will not be fetched from OpenBao. Use this only as an emergency measure and restore Fail after fixing the operator.

6. Bridge not yet connected to beacon

Symptom: The operator pod is Running and Ready, but pod creation fails with a webhook error indicating the broker is not connected or the bridge is unavailable.

This is expected during initial installation: the operator's startup probe is decoupled from beacon connectivity (operator v0.9.4+). The pod becomes Ready quickly, but the beacon P2P match between the operator and the bridge completes asynchronously — typically within 500ms once both sides have connected, but only after the bridge has registered with the beacon.

Diagnosis:

# Check operator logs for beacon connection status
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "beacon\|bridge\|broker"

Look for log lines indicating the operational mTLS connection is established. If you see only init-phase messages and no operational connection, the bridge has not yet connected.

Common causes:

  • The bridge (cloudtaser-onprem) is not running or has not reached the beacon yet.
  • The beacon address in the Helm values (operator.broker.beacon.address) does not resolve to the correct host.
  • Network connectivity from the bridge side to the beacon is blocked.

Fix: Verify the bridge is running and can reach the beacon, then retry pod creation. Once the bridge connects, the P2P match completes in under 500ms and subsequent webhook calls succeed.

7. HA race condition on certificate creation

When running the operator with multiple replicas (operator.ha: true), all replicas attempt to load or create the certificate Secret on startup. The code handles this race correctly (using Kubernetes AlreadyExists error detection), but transient API server errors during startup may cause one replica to fail.

Fix: The failing replica will restart and successfully read the Secret created by the other replica. No manual intervention needed.


Operator Cannot Connect to OpenBao

Symptom: The operator pod starts but crashes or logs errors about OpenBao connectivity. The webhook does not serve TLS and pod creation may be blocked.

Diagnosis:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "vault\|secret backend"

Common causes:

1. OpenBao address not configured

The operator needs vault.address set in Helm values when using secretBackend: vault (the default). If it is empty, the operator cannot connect.

Fix: Set the OpenBao address in Helm values:

operator:
  vault:
    address: "https://vault.eu.example.com"

Or set the environment variable VAULT_ADDR on the operator deployment.

2. Kubernetes auth not configured in OpenBao

The operator authenticates to OpenBao using Kubernetes auth. If the auth method is not enabled or the cloudtaser-operator role does not exist, authentication fails.

Diagnosis:

# Check if Kubernetes auth is enabled
vault auth list | grep kubernetes

# Check if the operator role exists
vault read auth/kubernetes/role/cloudtaser-operator

Fix: Run cloudtaser-cli source configure to create the required policy and auth role:

cloudtaser-cli source configure \
  --openbao-addr https://vault.eu.example.com \
  --token hvs.YOUR_ADMIN_TOKEN

3. ServiceAccount not bound to OpenBao role

The OpenBao Kubernetes auth role must allow the operator's ServiceAccount. If the ServiceAccount name or namespace does not match the role's bound_service_account_names and bound_service_account_namespaces, authentication is rejected.

Diagnosis:

vault read auth/kubernetes/role/cloudtaser-operator

Check that bound_service_account_names includes the operator's ServiceAccount name and bound_service_account_namespaces includes cloudtaser-system.

4. OpenBao is sealed

If OpenBao is sealed, all requests fail. The operator will log connection errors and retry.

Fix: Unseal OpenBao:

vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>

5. Network connectivity

The operator pod must be able to reach OpenBao over the network. Common blockers include NetworkPolicies, firewall rules, VPC peering, and security groups.

Diagnosis:

kubectl exec -n cloudtaser-system deploy/cloudtaser-operator -- \
  wget -q --spider https://vault.eu.example.com/v1/sys/health

6. OpenBao TLS certificate not trusted

If OpenBao uses a private CA, the operator cannot verify the TLS certificate. Mount the CA certificate or set vault.caCert in Helm values. For development only, vault.tlsSkipVerify: true disables verification.

Do not use tlsSkipVerify in production

Disabling TLS verification allows man-in-the-middle attacks on the connection between the operator and OpenBao. Use it only for local development.

7. Falling back to in-memory secrets

If OpenBao connectivity cannot be restored immediately and pod creation is blocked, switch the operator to in-memory secrets as a temporary fallback:

helm upgrade cloudtaser cloudtaser/cloudtaser \
  --namespace cloudtaser-system \
  --set operator.secretBackend=inmemory \
  --reuse-values

The operator will restart, generate certificates, hold them in memory, and resume serving the webhook. Switch back to vault after resolving OpenBao connectivity. See Zero Kubernetes Secrets Architecture for details on OpenBao-based secret backend.


Wrapper Startup Failures

Symptom: Pod starts but the application container exits immediately or logs errors from the cloudtaser wrapper.

Diagnosis:

# Check wrapper startup logs
kubectl logs <pod-name> -c <container-name> 2>&1 | head -50

# Check if the wrapper binary exists
kubectl exec <pod-name> -c <container-name> -- ls -la /cloudtaser/wrapper

Common causes:

1. Missing required environment variables

The wrapper validates its configuration at startup and exits with an error if required variables are missing. Required variables:

Variable Source
VAULT_ADDR From cloudtaser.io/secretstore-address annotation or CloudTaserConfig CR
CLOUDTASER_SECRET_PATHS From cloudtaser.io/secret-paths annotation
CLOUDTASER_ORIGINAL_CMD Resolved by operator from container image or pod spec
CLOUDTASER_ENV_MAP From cloudtaser.io/env-map annotation

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "configuration error"

Typical error messages:

configuration error: VAULT_ADDR is required
configuration error: CLOUDTASER_SECRET_PATHS is required (comma-separated list)
configuration error: CLOUDTASER_ORIGINAL_CMD is required
configuration error: CLOUDTASER_ENV_MAP is required

Fix: Ensure all required annotations are set on the pod template:

annotations:
  cloudtaser.io/inject: "true"
  cloudtaser.io/secretstore-address: "https://vault.eu.example.com:8200"
  cloudtaser.io/secretstore-role: "myapp"
  cloudtaser.io/secret-paths: "secret/data/myapp/config"
  cloudtaser.io/env-map: "password=PGPASSWORD,api_key=API_KEY"

2. OpenBao authentication failure

The wrapper authenticates to OpenBao using the pod's Kubernetes ServiceAccount token. This can fail for several reasons.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep -i "vault\|auth\|login"

Common errors and fixes:

Error Cause Fix
kubernetes login: ... connection refused OpenBao endpoint unreachable Check firewall rules, VPC peering, NetworkPolicies
kubernetes login: ... 403 Forbidden Role not configured or SA not bound Verify OpenBao role exists and allows the pod's ServiceAccount
kubernetes login returned no auth info Auth method disabled or wrong mount path Check vault auth list and the cloudtaser.io/secretstore-auth-path annotation
reading service account token: no such file SA token not mounted Check automountServiceAccountToken is not false
TLS handshake error OpenBao uses a private CA Mount the CA bundle or set cloudtaser.io/secretstore-tls-skip-verify: "true" (dev only)

3. Secret fetch failure

After authentication, the wrapper fetches secrets from the configured paths.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "failed to fetch secrets"

Common errors:

Error Cause Fix
secret not found: secret/data/myapp/config Wrong path or secret does not exist Verify path with vault kv get secret/myapp/config
permission denied OpenBao policy does not allow reading this path Update the OpenBao policy for the cloudtaser role
reading secret: ... 404 Missing data/ prefix for KV v2 Use secret/data/myapp/config not secret/myapp/config

4. Env map parse failure

The CLOUDTASER_ENV_MAP format is vault_field=ENV_VAR. The wrapper exits if it cannot match OpenBao fields to the fetched secrets.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "failed to parse env map"

Fix: Verify that the field names in the env map match the keys in OpenBao secret:

# Check what keys exist in the secret
vault kv get -format=json secret/myapp/config | jq '.data.data | keys'

# Env map must reference these exact field names
# cloudtaser.io/env-map: "password=PGPASSWORD,username=PGUSER"

5. Original command resolution failure

The operator resolves the container's original entrypoint (command + args) and sets it as CLOUDTASER_ORIGINAL_CMD and CLOUDTASER_ORIGINAL_ARGS. If the container image specifies neither ENTRYPOINT nor CMD, the operator queries the container registry.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "CLOUDTASER_ORIGINAL_CMD"

If CLOUDTASER_ORIGINAL_CMD is empty:

  • The operator could not resolve the entrypoint from the pod spec or the container registry
  • Check that imagePullSecrets are configured (the operator needs registry access)
  • Set the command explicitly in the pod spec

6. memfd_secret or mlock failure

The wrapper allocates protected memory for secrets. If CLOUDTASER_REQUIRE_MEMFD_SECRET=true or CLOUDTASER_REQUIRE_MLOCK=true is set but the kernel does not support the feature, the wrapper exits.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep -i "memfd_secret\|mlock\|FATAL"

Fix: Either upgrade the node kernel (5.14+ for memfd_secret) or remove the requirement flags. Without these flags, the wrapper logs a warning but continues with degraded protection.

7. Sealed mode: waiting for unseal

If cloudtaser.io/secretstore-auth-method: "token" is set but no token is provided, the wrapper starts in sealed mode. It is alive (liveness probe passes) but not ready (readiness probe returns 503) until a token is delivered via POST /v1/unseal from the operator's auth broker.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "sealed"

Look for:

{"level":"INFO","msg":"sealed: token auth selected but no token provided, waiting for POST /v1/unseal"}

Fix: Ensure the operator's auth broker is running and can reach the pod. The broker delivers tokens to sealed pods automatically. Check operator logs:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep "broker\|unseal"

8. Lease renewal failure

After initial secret fetch, the wrapper renews OpenBao leases periodically (default: every 30 seconds). If renewal fails, the wrapper re-fetches the secret.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "lease renewal\|token renewal"

If both renewal and re-fetch fail repeatedly, the application continues running with the last known secrets. The wrapper logs errors but does not kill the child process.

9. Rotation strategy errors

The wrapper supports three rotation strategies when secrets change:

Strategy Behavior
restart Kills and re-forks the child process with new env vars (default; works for every binary)
sighup Sends SIGHUP to the child; relies on the application to re-read its secrets on the signal
none Logs the change but does nothing

Choice of strategy is per-workload. restart is the default and is compatible with every binary (static Go, musl, any libc) because it simply re-executes the child with the new environment. sighup is only appropriate for applications that have an explicit SIGHUP handler for config reload (nginx, postgres, some custom daemons) -- for most off-the-shelf workloads, leave the default.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "rotation\|SIGHUP"

Common CLI Errors

"failed to build kubeconfig"

The CLI cannot find or parse your kubeconfig. Ensure ~/.kube/config exists or pass --kubeconfig:

cloudtaser-cli target status --kubeconfig /path/to/kubeconfig

"OpenBao is sealed"

The OpenBao instance is sealed and cannot serve requests. Unseal it:

vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>

"permission denied" on OpenBao operations

OpenBao token used with cloudtaser-cli target connect requires admin-level permissions. Ensure it has policies for:

  • sys/auth/* (to enable and configure auth methods)
  • auth/kubernetes/* (to configure Kubernetes auth)
  • sys/policies/* (to create policies)

CrashLoopBackOff with "context deadline exceeded"

Symptom: Wrapper pod enters CrashLoopBackOff. Logs show context deadline exceeded during the broker secret fetch or OpenBao authentication.

Diagnosis:

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "context deadline exceeded"

Common causes:

1. NetworkPolicy blocking DNS on GKE (NodeLocal DNSCache)

On GKE clusters with NodeLocal DNSCache enabled, DNS traffic goes to 169.254.20.10 (a hostNetwork address), not to the kube-dns ClusterIP. A NetworkPolicy egress rule using namespaceSelector: kube-system does not match hostNetwork pods — DNS is silently blocked and all name resolution times out.

Fix: Add an explicit ipBlock egress rule for 169.254.20.10/32 on port 53 in your NetworkPolicy. See the full explanation and YAML in GKE Deployment Guide — NetworkPolicy and NodeLocal DNSCache.

2. OpenBao unreachable

The broker or OpenBao endpoint is down or unreachable from the pod network. Check connectivity:

kubectl exec <pod-name> -- wget -q --spider --timeout=5 https://vault.eu.example.com/v1/sys/health

3. Beacon not matched

If using P2P mode, the beacon may not have matched the operator and bridge yet. Check operator logs for beacon connection status. See Bridge not yet connected to beacon above.


Getting Help

  1. Check component logs:

    kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator
    kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
    kubectl logs <pod-name> -c <container-name>
    
  2. Run validation:

    cloudtaser-cli target validate --secretstore-address https://vault.eu.example.com
    
  3. Run a full audit:

    cloudtaser-cli target audit --secretstore-address https://vault.eu.example.com