Troubleshooting¶
Common issues and their solutions when running cloudtaser. This page covers diagnostics and fixes for the operator, wrapper, eBPF agent, and CLI.
Pods Stuck in Init¶
Symptom: Pod stays in Init:0/1 state. The init container that copies the wrapper binary has not completed.
Diagnosis:
Common causes:
1. Wrapper image not pullable¶
The init container pulls from ghcr.io/cloudtaser/cloudtaser-wrapper. Verify that image pull secrets are configured:
If using a private registry, ensure imagePullSecrets is set in the Helm values.
2. EmptyDir volume mount failure¶
The wrapper is copied to a memory-backed emptyDir at /cloudtaser/. Check that the node has sufficient memory for the volume (the wrapper binary is approximately 10 MB).
3. Operator not running¶
If the operator is down, the mutating webhook may fail and block pod creation (default failurePolicy: Fail):
failurePolicy: Fail blocks pod creation
If the operator is not running, pods will fail to schedule. Restart the operator or temporarily set failurePolicy: Ignore in the MutatingWebhookConfiguration. Setting it to Ignore means pods will start without injection -- use only as an emergency measure.
Operator pod Ready does not imply beacon connected (operator v0.9.4+)
The operator's startup probe is decoupled from beacon connectivity. The pod becomes Ready in seconds regardless of whether the bridge has connected to the beacon yet. If the bridge has not connected, the operator pod will be Ready but the webhook will return a clear error on pod creation attempts. This is expected during initial bootstrap — wait for the bridge to connect, then retry pod creation.
Wrapper Cannot Connect to OpenBao¶
Symptom: Pod starts but the application does not receive secrets. Container logs show OpenBao connection errors.
Diagnosis:
Common causes:
1. OpenBao endpoint unreachable¶
Verify network connectivity from the pod to OpenBao:
Check firewall rules, VPC peering, security groups, and NetworkPolicies.
2. Kubernetes auth not configured¶
The wrapper authenticates to OpenBao using the pod's ServiceAccount token. Verify the auth method is configured:
cloudtaser-cli target validate \
--secretstore-address https://vault.eu.example.com \
--secretstore-token hvs.YOUR_TOKEN
If auth is not configured, run cloudtaser-cli target connect to set it up.
3. Wrong OpenBao role¶
Verify the cloudtaser.io/secretstore-role annotation matches a role configured in OpenBao:
4. ServiceAccount not bound¶
The OpenBao role must allow the pod's ServiceAccount. Check the bound_service_account_names and bound_service_account_namespaces in the OpenBao role.
5. TLS certificate error¶
If OpenBao uses a private CA, the wrapper cannot verify the certificate. Mount the CA bundle into the pod.
Quick connectivity test
Run cloudtaser-cli target validate --secretstore-address https://vault.eu.example.com to check OpenBao health, seal status, and Kubernetes auth configuration in one step.
eBPF Agent Not Starting¶
Symptom: cloudtaser-ebpf pods are in CrashLoopBackOff or not starting.
Diagnosis:
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
kubectl describe daemonset -n cloudtaser-system cloudtaser-ebpf
Common causes:
1. Kernel too old¶
The eBPF agent requires Linux kernel 5.8+ with BTF (BPF Type Format) support. Check the node kernel:
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'
Kernel version recommendations
For full protection (including memfd_secret), kernel 5.14+ is recommended. See Kernel Compatibility for the complete support matrix.
2. BTF not available¶
The agent needs /sys/kernel/btf/vmlinux. GKE COS and Ubuntu nodes have this by default. Some custom AMIs may not:
3. Insufficient capabilities¶
The eBPF agent requires privileged mode with SYS_ADMIN, SYS_PTRACE, NET_ADMIN, and SYS_RESOURCE capabilities. Verify that PodSecurityPolicy or PodSecurityStandard is not blocking these.
4. GKE Autopilot¶
Autopilot clusters do not allow privileged pods or hostPID. Use GKE Standard instead.
5. Fargate (EKS)¶
Fargate does not support DaemonSets or host-level access. Use managed or self-managed node groups.
Unsupported environments
GKE Autopilot and AWS Fargate are not compatible with the eBPF agent. The wrapper still provides secret injection, but runtime enforcement (blocking /proc reads, ptrace, etc.) is not available.
Secrets Not Injected¶
Symptom: Application starts but environment variables with secrets are missing.
Diagnosis:
# Check if injection annotation is present
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations}' | grep cloudtaser
# Check if wrapper is running as PID 1
kubectl exec <pod-name> -- cat /proc/1/cmdline | tr '\0' ' '
Common causes:
1. Missing inject annotation¶
The pod template must have cloudtaser.io/inject: "true". The annotation must be on the pod template, not on the Deployment itself:
2. Wrong env-map syntax¶
The cloudtaser.io/env-map annotation maps OpenBao fields to environment variables. Format: vault_field=ENV_VAR,field2=ENV_VAR2:
Common mistake: reversed order
The correct format is vault_field=ENV_VAR, not ENV_VAR=vault_field. A reversed mapping will silently fail to inject the expected secrets.
3. Wrong OpenBao path¶
Ensure cloudtaser.io/secret-paths uses the KV v2 data path:
# Correct:
cloudtaser.io/secret-paths: "secret/data/myapp/config"
# Wrong (missing data/ prefix for KV v2):
cloudtaser.io/secret-paths: "secret/myapp/config"
4. Namespace not in webhook scope¶
The operator only injects pods in namespaces that are not system namespaces (kube-system, kube-public, kube-node-lease). Verify your namespace is not excluded.
5. Webhook not intercepting¶
Check the MutatingWebhookConfiguration:
High Latency on Pod Startup¶
Symptom: Pods take significantly longer to start after cloudtaser injection.
Common causes:
1. OpenBao fetch time¶
The wrapper fetches secrets before starting the application. If OpenBao is slow (geographically distant, under load), this adds startup latency.
Reduce OpenBao latency
- Deploy OpenBao in the same region as the cluster (still within the EU)
- Reduce the number of secret paths per pod (fewer OpenBao API calls)
- Ensure OpenBao is not sealed or in standby mode
2. Image entrypoint resolution¶
The operator resolves the container image entrypoint by querying the container registry. This adds latency on the first injection. The result is cached per image.
3. Init container image pull¶
The first pod on a node pulls the wrapper image. Subsequent pods use the cached image. Use imagePullPolicy: IfNotPresent (default) to avoid re-pulling.
Protection Score Low¶
Symptom: cloudtaser-cli target status or cloudtaser-cli target audit reports a low protection score.
The protection score (max 65) reflects which defenses are active:
| Check | Points | Fix |
|---|---|---|
memfd_secret |
15 | Use kernel 5.14+ on nodes |
mlock |
10 | Ensure CAP_IPC_LOCK is available (or ulimit -l unlimited) |
MADV_DONTDUMP |
5 | Automatic (requires wrapper v0.0.14+) |
PR_SET_DUMPABLE(0) |
5 | Automatic (requires wrapper v0.0.14+) |
| Token protected | 10 | Automatic when memfd_secret or mlock is available |
| eBPF connected | 10 | Ensure eBPF daemonset is running on the node |
| Kprobes active | 10 | Requires kernel CONFIG_BPF_KPROBE_OVERRIDE |
To improve the score:
- Upgrade node kernel to 5.14+ for
memfd_secretsupport (15 points). This is the single most impactful change. -
Ensure eBPF agent is running on every node (10 points):
-
Check kernel kprobe override support (10 points). Note:
CONFIG_BPF_KPROBE_OVERRIDEis not enabled on any major cloud provider kernel (GKE, EKS, AKS). See Kernel Compatibility for details.
eBPF Enforcement Issues¶
Symptom: eBPF agent is running but enforcement events are not generated, or legitimate operations are being blocked.
Diagnosis:
# Check agent logs
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
# Check enforce mode
kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf \
-o jsonpath='{.spec.template.spec.containers[0].env}' | grep ENFORCE
Common causes:
1. Enforce mode disabled¶
If ENFORCE_MODE=false, the agent only logs events without blocking. Enable via Helm:
2. Reactive kill fallback¶
On kernels without CONFIG_BPF_KPROBE_OVERRIDE (all major cloud providers), the agent uses reactive kill (SIGKILL after detection) instead of synchronous blocking. This is the expected behavior on GKE, EKS, and AKS.
Reactive kill is still effective
The race window between detection and SIGKILL is microseconds. An attacker reading /proc/pid/environ gets killed before they can exfiltrate the data over the network, because the network send is also monitored and blocked.
3. Application uses io_uring¶
cloudtaser blocks io_uring_setup() for protected processes because io_uring bypasses buffer-level monitoring. Applications requiring io_uring must use standard syscalls instead.
Ring Buffer Errors¶
Symptom: Agent logs show ring buffer read errors, or the health endpoint reports unhealthy.
Diagnosis:
Common causes:
1. Consecutive read failures¶
The agent tracks consecutive ring buffer read errors. After 10 consecutive failures, it marks itself unhealthy and stops the event reader:
This typically indicates the BPF ring buffer file descriptor has become invalid. Restart the agent pod:
2. Event channel full¶
If the agent logs "event channel full, dropping event", the user-space event processing pipeline cannot keep up with kernel event volume. This happens under extreme syscall load from monitored processes.
Mitigation:
- Reduce the number of monitored PIDs
- Disable
LOG_ALLif enabled (it generates events for every syscall, not just violations) - Increase agent CPU limits to allow faster event processing
3. Ring buffer map not found¶
If the agent fails with "events ring buffer map not found", the BPF object file may be corrupted or incompatible with the running kernel. Verify the eBPF object path:
Secret Region Protection Not Working¶
Symptom: The eBPF agent is running but does not detect access to protected process memory.
Diagnosis:
# Check if the agent has registered any protected pod cgroups
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf | grep "registered"
# Check if the wrapper is connecting to the agent
kubectl logs <pod-name> -c <container-name> | grep "ebpf"
Common causes:
1. Wrapper not registering with agent¶
The wrapper connects to the eBPF agent via gRPC (default: 0.0.0.0:9443) to register its PID and secret memory regions. If the socket path is not shared between the wrapper and agent, registration fails silently.
Verify the host path volume is mounted in both the eBPF daemonset and the protected pod:
kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf \
-o jsonpath='{.spec.template.spec.volumes}' | python3 -m json.tool
2. BPF maps not populated¶
The agent maintains BPF maps for enforcement. In v0.1.59+ (helm 1.0.61+) the pod-scope map is a BPF_MAP_TYPE_CGROUP_ARRAY; earlier versions used an inode-based monitored_cgroups HASH. The maps relevant for diagnostics:
| Map | Purpose |
|---|---|
protected_cgroups |
Cgroup array for pod-scope enforcement (v0.1.59+, replaces monitored_cgroups HASH) |
secret_regions |
Memory address ranges containing secrets per PID |
secret_content |
16-byte content prefixes for content-based leak detection |
If protected_cgroups is empty (or monitored_pids on pre-v0.1.59 agents), no enforcement occurs. Check agent logs for cgroup registration events.
3. Content-based matching disabled¶
If the agent logs "secret_content map not found, content-based matching disabled" at startup, the BPF object was compiled without the secret_content map. Content-based leak detection (matching secret data in write()/sendto() buffers) will not work, but cgroup-scope region protection still functions.
4. Global privilege escalation detection¶
If GLOBAL_PRIVESC_DETECT=true, the agent monitors kernel module and eBPF program loading from all PIDs on the node, not just protected pod cgroups. If this is disabled and you expect to see module_load or bpf_load events from non-protected processes, enable it:
Webhook Failures¶
Symptom: Pod creation fails with errors referencing the cloudtaser webhook, or pods are created without injection despite having the correct annotations.
Diagnosis:
# Check webhook configuration exists
kubectl get mutatingwebhookconfiguration cloudtaser-operator-webhook
# Check webhook endpoint is reachable
kubectl get endpoints -n cloudtaser-system cloudtaser-operator-webhook
# Check operator logs for webhook errors
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "webhook\|error\|admission"
Common causes:
1. Certificate expired or CA bundle mismatch¶
The operator generates self-signed TLS certificates for the webhook. Where they are stored depends on the secretBackend setting:
vault(default): Certificates are stored in OpenBao atsecret/cloudtaser/system/webhook-tlsand served from memory. See Zero Kubernetes Secrets Architecture.kubernetes(fallback): Certificates are stored in a Kubernetes Secret (cloudtaser-operator-certs).
The CA bundle is patched into the MutatingWebhookConfiguration at startup and rotated 30 days before expiry (certificates are valid for 1 year).
If the CA bundle in the webhook configuration does not match the certificate the operator is serving, the API server cannot verify the TLS connection and rejects webhook calls.
Diagnosis (OpenBao backend):
# Check operator logs for certificate errors
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "cert\|tls\|webhook"
Diagnosis (kubernetes backend):
# Check the cert secret exists
kubectl get secret -n cloudtaser-system cloudtaser-operator-certs
# Check cert expiry
kubectl get secret -n cloudtaser-system cloudtaser-operator-certs \
-o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -enddate
Fix: Restart the operator. It checks certificate expiry on startup and regenerates certificates if they expire within 30 days. It also patches the webhook configuration with the current CA bundle on every startup.
If using the kubernetes backend and the certificate Secret is corrupted, delete it and restart the operator. It will generate new certificates:
kubectl delete secret -n cloudtaser-system cloudtaser-operator-certs
kubectl rollout restart deployment -n cloudtaser-system cloudtaser-operator
2. Namespace not in webhook scope¶
The MutatingWebhookConfiguration uses a namespaceSelector that excludes cloudtaser-system and kube-system:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- cloudtaser-system
- kube-system
Pods in these namespaces are never intercepted by the webhook. Pods in kube-public and kube-node-lease are also excluded by the operator's internal logic.
Fix: Ensure your workload is in a non-system namespace.
3. Webhook timeout¶
The webhook has a default timeout of 10 seconds (operator.webhook.timeoutSeconds in Helm values). If the operator is under heavy load or the container registry is slow (the operator resolves image entrypoints by querying the registry), the webhook may time out.
Diagnosis:
# Check for timeout events
kubectl get events --field-selector reason=FailedCreate -A | grep cloudtaser
Fix: Increase the webhook timeout in Helm values:
Or investigate why the operator is slow (check CPU/memory limits, registry latency).
4. Annotation format errors¶
The operator silently skips injection if annotations are malformed. Common mistakes:
| Mistake | Correct |
|---|---|
cloudtaser.io/inject: true (no quotes) |
cloudtaser.io/inject: "true" |
| Annotations on Deployment metadata | Annotations on spec.template.metadata |
cloudtaser.io/env-map: "ENV_VAR=vault_field" |
cloudtaser.io/env-map: "vault_field=ENV_VAR" |
| Missing OpenBao address | Set cloudtaser.io/secretstore-address or use a CloudTaserConfig CR |
| Missing secret paths | Set cloudtaser.io/secret-paths |
Diagnosis:
# Check if the pod has the injection status annotation
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.cloudtaser\.io/status}'
If the annotation is missing, the webhook did not process the pod. Check if the inject annotation is on the pod template (not the Deployment itself).
5. failurePolicy: Fail blocking all pods¶
The default failurePolicy is Fail, meaning if the webhook is unreachable (operator down, network issue), the API server rejects all pod creation in the webhook's scope.
Emergency fix:
# Switch to Ignore (pods start without injection)
kubectl patch mutatingwebhookconfiguration cloudtaser-operator-webhook \
--type='json' \
-p='[{"op":"replace","path":"/webhooks/0/failurePolicy","value":"Ignore"}]'
Ignore means no injection
With failurePolicy: Ignore, pods start without cloudtaser wrapper injection. Secrets will not be fetched from OpenBao. Use this only as an emergency measure and restore Fail after fixing the operator.
6. Bridge not yet connected to beacon¶
Symptom: The operator pod is Running and Ready, but pod creation fails with a webhook error indicating the broker is not connected or the bridge is unavailable.
This is expected during initial installation: the operator's startup probe is decoupled from beacon connectivity (operator v0.9.4+). The pod becomes Ready quickly, but the beacon P2P match between the operator and the bridge completes asynchronously — typically within 500ms once both sides have connected, but only after the bridge has registered with the beacon.
Diagnosis:
# Check operator logs for beacon connection status
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "beacon\|bridge\|broker"
Look for log lines indicating the operational mTLS connection is established. If you see only init-phase messages and no operational connection, the bridge has not yet connected.
Common causes:
- The bridge (
cloudtaser-onprem) is not running or has not reached the beacon yet. - The beacon address in the Helm values (
operator.broker.beacon.address) does not resolve to the correct host. - Network connectivity from the bridge side to the beacon is blocked.
Fix: Verify the bridge is running and can reach the beacon, then retry pod creation. Once the bridge connects, the P2P match completes in under 500ms and subsequent webhook calls succeed.
7. HA race condition on certificate creation¶
When running the operator with multiple replicas (operator.ha: true), all replicas attempt to load or create the certificate Secret on startup. The code handles this race correctly (using Kubernetes AlreadyExists error detection), but transient API server errors during startup may cause one replica to fail.
Fix: The failing replica will restart and successfully read the Secret created by the other replica. No manual intervention needed.
Operator Cannot Connect to OpenBao¶
Symptom: The operator pod starts but crashes or logs errors about OpenBao connectivity. The webhook does not serve TLS and pod creation may be blocked.
Diagnosis:
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep -i "vault\|secret backend"
Common causes:
1. OpenBao address not configured¶
The operator needs vault.address set in Helm values when using secretBackend: vault (the default). If it is empty, the operator cannot connect.
Fix: Set the OpenBao address in Helm values:
Or set the environment variable VAULT_ADDR on the operator deployment.
2. Kubernetes auth not configured in OpenBao¶
The operator authenticates to OpenBao using Kubernetes auth. If the auth method is not enabled or the cloudtaser-operator role does not exist, authentication fails.
Diagnosis:
# Check if Kubernetes auth is enabled
vault auth list | grep kubernetes
# Check if the operator role exists
vault read auth/kubernetes/role/cloudtaser-operator
Fix: Run cloudtaser-cli source configure to create the required policy and auth role:
cloudtaser-cli source configure \
--openbao-addr https://vault.eu.example.com \
--token hvs.YOUR_ADMIN_TOKEN
3. ServiceAccount not bound to OpenBao role¶
The OpenBao Kubernetes auth role must allow the operator's ServiceAccount. If the ServiceAccount name or namespace does not match the role's bound_service_account_names and bound_service_account_namespaces, authentication is rejected.
Diagnosis:
Check that bound_service_account_names includes the operator's ServiceAccount name and bound_service_account_namespaces includes cloudtaser-system.
4. OpenBao is sealed¶
If OpenBao is sealed, all requests fail. The operator will log connection errors and retry.
Fix: Unseal OpenBao:
5. Network connectivity¶
The operator pod must be able to reach OpenBao over the network. Common blockers include NetworkPolicies, firewall rules, VPC peering, and security groups.
Diagnosis:
kubectl exec -n cloudtaser-system deploy/cloudtaser-operator -- \
wget -q --spider https://vault.eu.example.com/v1/sys/health
6. OpenBao TLS certificate not trusted¶
If OpenBao uses a private CA, the operator cannot verify the TLS certificate. Mount the CA certificate or set vault.caCert in Helm values. For development only, vault.tlsSkipVerify: true disables verification.
Do not use tlsSkipVerify in production
Disabling TLS verification allows man-in-the-middle attacks on the connection between the operator and OpenBao. Use it only for local development.
7. Falling back to in-memory secrets¶
If OpenBao connectivity cannot be restored immediately and pod creation is blocked, switch the operator to in-memory secrets as a temporary fallback:
helm upgrade cloudtaser cloudtaser/cloudtaser \
--namespace cloudtaser-system \
--set operator.secretBackend=inmemory \
--reuse-values
The operator will restart, generate certificates, hold them in memory, and resume serving the webhook. Switch back to vault after resolving OpenBao connectivity. See Zero Kubernetes Secrets Architecture for details on OpenBao-based secret backend.
Wrapper Startup Failures¶
Symptom: Pod starts but the application container exits immediately or logs errors from the cloudtaser wrapper.
Diagnosis:
# Check wrapper startup logs
kubectl logs <pod-name> -c <container-name> 2>&1 | head -50
# Check if the wrapper binary exists
kubectl exec <pod-name> -c <container-name> -- ls -la /cloudtaser/wrapper
Common causes:
1. Missing required environment variables¶
The wrapper validates its configuration at startup and exits with an error if required variables are missing. Required variables:
| Variable | Source |
|---|---|
VAULT_ADDR |
From cloudtaser.io/secretstore-address annotation or CloudTaserConfig CR |
CLOUDTASER_SECRET_PATHS |
From cloudtaser.io/secret-paths annotation |
CLOUDTASER_ORIGINAL_CMD |
Resolved by operator from container image or pod spec |
CLOUDTASER_ENV_MAP |
From cloudtaser.io/env-map annotation |
Diagnosis:
Typical error messages:
configuration error: VAULT_ADDR is required
configuration error: CLOUDTASER_SECRET_PATHS is required (comma-separated list)
configuration error: CLOUDTASER_ORIGINAL_CMD is required
configuration error: CLOUDTASER_ENV_MAP is required
Fix: Ensure all required annotations are set on the pod template:
annotations:
cloudtaser.io/inject: "true"
cloudtaser.io/secretstore-address: "https://vault.eu.example.com:8200"
cloudtaser.io/secretstore-role: "myapp"
cloudtaser.io/secret-paths: "secret/data/myapp/config"
cloudtaser.io/env-map: "password=PGPASSWORD,api_key=API_KEY"
2. OpenBao authentication failure¶
The wrapper authenticates to OpenBao using the pod's Kubernetes ServiceAccount token. This can fail for several reasons.
Diagnosis:
Common errors and fixes:
| Error | Cause | Fix |
|---|---|---|
kubernetes login: ... connection refused |
OpenBao endpoint unreachable | Check firewall rules, VPC peering, NetworkPolicies |
kubernetes login: ... 403 Forbidden |
Role not configured or SA not bound | Verify OpenBao role exists and allows the pod's ServiceAccount |
kubernetes login returned no auth info |
Auth method disabled or wrong mount path | Check vault auth list and the cloudtaser.io/secretstore-auth-path annotation |
reading service account token: no such file |
SA token not mounted | Check automountServiceAccountToken is not false |
TLS handshake error |
OpenBao uses a private CA | Mount the CA bundle or set cloudtaser.io/secretstore-tls-skip-verify: "true" (dev only) |
3. Secret fetch failure¶
After authentication, the wrapper fetches secrets from the configured paths.
Diagnosis:
Common errors:
| Error | Cause | Fix |
|---|---|---|
secret not found: secret/data/myapp/config |
Wrong path or secret does not exist | Verify path with vault kv get secret/myapp/config |
permission denied |
OpenBao policy does not allow reading this path | Update the OpenBao policy for the cloudtaser role |
reading secret: ... 404 |
Missing data/ prefix for KV v2 |
Use secret/data/myapp/config not secret/myapp/config |
4. Env map parse failure¶
The CLOUDTASER_ENV_MAP format is vault_field=ENV_VAR. The wrapper exits if it cannot match OpenBao fields to the fetched secrets.
Diagnosis:
Fix: Verify that the field names in the env map match the keys in OpenBao secret:
# Check what keys exist in the secret
vault kv get -format=json secret/myapp/config | jq '.data.data | keys'
# Env map must reference these exact field names
# cloudtaser.io/env-map: "password=PGPASSWORD,username=PGUSER"
5. Original command resolution failure¶
The operator resolves the container's original entrypoint (command + args) and sets it as CLOUDTASER_ORIGINAL_CMD and CLOUDTASER_ORIGINAL_ARGS. If the container image specifies neither ENTRYPOINT nor CMD, the operator queries the container registry.
Diagnosis:
If CLOUDTASER_ORIGINAL_CMD is empty:
- The operator could not resolve the entrypoint from the pod spec or the container registry
- Check that
imagePullSecretsare configured (the operator needs registry access) - Set the command explicitly in the pod spec
6. memfd_secret or mlock failure¶
The wrapper allocates protected memory for secrets. If CLOUDTASER_REQUIRE_MEMFD_SECRET=true or CLOUDTASER_REQUIRE_MLOCK=true is set but the kernel does not support the feature, the wrapper exits.
Diagnosis:
Fix: Either upgrade the node kernel (5.14+ for memfd_secret) or remove the requirement flags. Without these flags, the wrapper logs a warning but continues with degraded protection.
7. Sealed mode: waiting for unseal¶
If cloudtaser.io/secretstore-auth-method: "token" is set but no token is provided, the wrapper starts in sealed mode. It is alive (liveness probe passes) but not ready (readiness probe returns 503) until a token is delivered via POST /v1/unseal from the operator's auth broker.
Diagnosis:
Look for:
{"level":"INFO","msg":"sealed: token auth selected but no token provided, waiting for POST /v1/unseal"}
Fix: Ensure the operator's auth broker is running and can reach the pod. The broker delivers tokens to sealed pods automatically. Check operator logs:
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator | grep "broker\|unseal"
8. Lease renewal failure¶
After initial secret fetch, the wrapper renews OpenBao leases periodically (default: every 30 seconds). If renewal fails, the wrapper re-fetches the secret.
Diagnosis:
If both renewal and re-fetch fail repeatedly, the application continues running with the last known secrets. The wrapper logs errors but does not kill the child process.
9. Rotation strategy errors¶
The wrapper supports three rotation strategies when secrets change:
| Strategy | Behavior |
|---|---|
restart |
Kills and re-forks the child process with new env vars (default; works for every binary) |
sighup |
Sends SIGHUP to the child; relies on the application to re-read its secrets on the signal |
none |
Logs the change but does nothing |
Choice of strategy is per-workload. restart is the default and is compatible with every binary (static Go, musl, any libc) because it simply re-executes the child with the new environment. sighup is only appropriate for applications that have an explicit SIGHUP handler for config reload (nginx, postgres, some custom daemons) -- for most off-the-shelf workloads, leave the default.
Diagnosis:
Common CLI Errors¶
"failed to build kubeconfig"¶
The CLI cannot find or parse your kubeconfig. Ensure ~/.kube/config exists or pass --kubeconfig:
"OpenBao is sealed"¶
The OpenBao instance is sealed and cannot serve requests. Unseal it:
"permission denied" on OpenBao operations¶
OpenBao token used with cloudtaser-cli target connect requires admin-level permissions. Ensure it has policies for:
sys/auth/*(to enable and configure auth methods)auth/kubernetes/*(to configure Kubernetes auth)sys/policies/*(to create policies)
CrashLoopBackOff with "context deadline exceeded"¶
Symptom: Wrapper pod enters CrashLoopBackOff. Logs show context deadline exceeded during the broker secret fetch or OpenBao authentication.
Diagnosis:
Common causes:
1. NetworkPolicy blocking DNS on GKE (NodeLocal DNSCache)¶
On GKE clusters with NodeLocal DNSCache enabled, DNS traffic goes to 169.254.20.10 (a hostNetwork address), not to the kube-dns ClusterIP. A NetworkPolicy egress rule using namespaceSelector: kube-system does not match hostNetwork pods — DNS is silently blocked and all name resolution times out.
Fix: Add an explicit ipBlock egress rule for 169.254.20.10/32 on port 53 in your NetworkPolicy. See the full explanation and YAML in GKE Deployment Guide — NetworkPolicy and NodeLocal DNSCache.
2. OpenBao unreachable¶
The broker or OpenBao endpoint is down or unreachable from the pod network. Check connectivity:
3. Beacon not matched¶
If using P2P mode, the beacon may not have matched the operator and bridge yet. Check operator logs for beacon connection status. See Bridge not yet connected to beacon above.
Getting Help¶
-
Check component logs:
-
Run validation:
-
Run a full audit: