Troubleshooting Decision Trees¶

Interactive diagnostic flowcharts for the most common CloudTaser issues. Each tree guides you from a symptom to a diagnosis with exact commands to run at each step.

For the full reference of all symptoms and causes, see Troubleshooting.

Quick Check Commands¶

Before following a decision tree, run these five commands to diagnose the most common issues:

# 1. Is the operator running?
kubectl get pods -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator

# 2. Is the webhook registered?
kubectl get mutatingwebhookconfiguration cloudtaser-operator-webhook

# 3. Is the pod annotated for injection?
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations.cloudtaser\.io/inject}'

# 4. Is the wrapper running as PID 1?
kubectl exec <pod-name> -- cat /proc/1/cmdline | tr '\0' ' '

# 5. What do the wrapper logs say?
kubectl logs <pod-name> -c <container-name> 2>&1 | head -30

Sidecar Not Injected¶

Your pod starts without the CloudTaser wrapper. The original entrypoint runs directly instead of /cloudtaser/wrapper.

graph TD
    A[Pod starts without CloudTaser wrapper] --> B{Is cloudtaser.io/inject: true<br/>on the pod template?}
    B -->|No| B1[Add annotation to pod template<br/>not the Deployment metadata]
    B -->|Yes| C{Is the operator running?}
    C -->|No| C1[Start the operator:<br/>kubectl rollout status -n cloudtaser-system<br/>deploy/cloudtaser-operator]
    C -->|Yes| D{Is the MutatingWebhookConfiguration<br/>present?}
    D -->|No| D1[Redeploy via Helm:<br/>helm upgrade cloudtaser ...]
    D -->|Yes| E{Does the webhook have a<br/>valid CA bundle?}
    E -->|No| E1[Restart operator to regenerate certs:<br/>kubectl rollout restart -n cloudtaser-system<br/>deploy/cloudtaser-operator]
    E -->|Yes| F{Is the namespace excluded?}
    F -->|Yes| F1[Move workload to a non-system namespace<br/>kube-system, kube-public, and<br/>kube-node-lease are excluded]
    F -->|No| G[Check operator logs for errors]

    style B1 fill:#2d5016,color:#fff
    style C1 fill:#2d5016,color:#fff
    style D1 fill:#2d5016,color:#fff
    style E1 fill:#2d5016,color:#fff
    style F1 fill:#2d5016,color:#fff
    style G fill:#7a4100,color:#fff

Step-by-step details¶

Is the annotation on the pod template?¶

The cloudtaser.io/inject: "true" annotation must be on the pod template inside spec.template.metadata.annotations, not on the Deployment's top-level metadata.annotations.

kubectl get deployment <name> -o jsonpath='{.spec.template.metadata.annotations}' | grep cloudtaser

Expected output: cloudtaser.io/inject: "true" present.

Is the operator running?¶

kubectl get pods -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator

Expected: at least one pod in Running state with 1/1 ready containers.

Is the MutatingWebhookConfiguration present?¶

kubectl get mutatingwebhookconfiguration cloudtaser-operator-webhook -o yaml

Expected: the resource exists with a caBundle field that is non-empty.

Does the webhook have a valid CA bundle?¶

kubectl get mutatingwebhookconfiguration cloudtaser-operator-webhook \
  -o jsonpath='{.webhooks[0].clientConfig.caBundle}' | base64 -d | openssl x509 -noout -dates

Expected: notAfter is in the future. If expired, the operator auto-rotates certs every 24 hours. Restart the operator to force immediate rotation.

Is the namespace excluded?¶

The operator skips injection in system namespaces: kube-system, kube-public, kube-node-lease.

kubectl get pod <pod-name> -o jsonpath='{.metadata.namespace}'

Wrapper Fails to Start¶

The pod starts but the wrapper container exits or enters CrashLoopBackOff.

graph TD
    A[Wrapper crashes or fails to start] --> B{Check wrapper logs}
    B --> C{VAULT_ADDR is required?}
    C -->|Yes| C1[Set cloudtaser.io/vault-address<br/>annotation on pod template]
    C -->|No| D{Cannot connect to Vault?}
    D -->|Yes| D1[Check network connectivity<br/>and firewall rules]
    D -->|No| E{Auth method error?}
    E -->|Yes| E1[Verify vault K8s auth is configured<br/>and role exists]
    E -->|No| F{CLOUDTASER_ENV_MAP is required?}
    F -->|Yes| F1[Set cloudtaser.io/env-map annotation]
    F -->|No| G{Secret path not found?}
    G -->|Yes| G1[Check vault path uses data/ prefix<br/>for KV v2]
    G -->|No| H[Check full wrapper logs for details]

    style C1 fill:#2d5016,color:#fff
    style D1 fill:#2d5016,color:#fff
    style E1 fill:#2d5016,color:#fff
    style F1 fill:#2d5016,color:#fff
    style G1 fill:#2d5016,color:#fff
    style H fill:#7a4100,color:#fff

Step-by-step details¶

Check wrapper logs¶

The wrapper logs to stderr in JSON format. Look for the structured error message:

kubectl logs <pod-name> -c <container-name> 2>&1 | head -20

The wrapper prints a human-readable error block when vault connection fails:

=== CloudTaser Wrapper Error ===
Cannot connect to Vault: ...

Vault address missing¶

The wrapper requires VAULT_ADDR which is set by the operator from the cloudtaser.io/vault-address annotation or a CloudTaserConfig CR.

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].env}' | grep VAULT_ADDR

Network connectivity¶

Test from the pod:

kubectl exec <pod-name> -- wget -q --spider https://vault.eu.example.com/v1/sys/health

If this fails, check NetworkPolicies, firewall rules, and VPC peering.

Auth method configuration¶

For Kubernetes auth (the default), verify:

# Check auth method is enabled
vault auth list

# Check role exists and is bound to the service account
vault read auth/kubernetes/role/<role-name>

The role's bound_service_account_names and bound_service_account_namespaces must match the pod's ServiceAccount.

KV v2 path format¶

For KV v2 secret engines, the path must include data/:

# Correct
cloudtaser.io/secret-paths: "secret/data/myapp/config"

# Wrong (missing data/ prefix)
cloudtaser.io/secret-paths: "secret/myapp/config"

Secrets Not in Memory¶

The wrapper starts and the application runs, but secret environment variables are missing.

graph TD
    A[Application running but secrets missing] --> B{Is wrapper running as PID 1?}
    B -->|No| B1[Pod was not injected.<br/>See Sidecar Not Injected tree above]
    B -->|Yes| C{Check wrapper logs for<br/>secrets loaded message}
    C -->|Not found| C1[Wrapper failed to fetch secrets.<br/>See Wrapper Fails to Start tree above]
    C -->|Found| D{Is env-map syntax correct?}
    D -->|No| D1[Fix: vault_field=ENV_VAR<br/>not ENV_VAR=vault_field]
    D -->|Yes| E{Does the vault path contain<br/>the expected fields?}
    E -->|No| E1[Verify secret fields in vault:<br/>vault kv get secret/myapp/config]
    E -->|Yes| F{Is the app reading env vars<br/>correctly?}
    F -->|No| F1[Check app reads standard env vars<br/>not Kubernetes Secret references]
    F -->|Yes| G[Check if wrapper scrubbed env<br/>before app started]

    style B1 fill:#7a4100,color:#fff
    style C1 fill:#7a4100,color:#fff
    style D1 fill:#2d5016,color:#fff
    style E1 fill:#2d5016,color:#fff
    style F1 fill:#2d5016,color:#fff
    style G fill:#7a4100,color:#fff

Step-by-step details¶

Check wrapper is PID 1¶

kubectl exec <pod-name> -- cat /proc/1/cmdline | tr '\0' ' '

Expected output should start with /cloudtaser/wrapper.

Check wrapper logs for success¶

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "secrets loaded"

Expected: "msg":"secrets loaded, starting child process","command":"...","secret_count":N where N is greater than 0.

Check env-map syntax¶

The format is vault_field=ENV_VAR, not ENV_VAR=vault_field. Multiple mappings are comma-separated:

# Correct
cloudtaser.io/env-map: "password=PGPASSWORD,username=PGUSER"

# Wrong (reversed)
cloudtaser.io/env-map: "PGPASSWORD=password"

Common mistake: reversed order

The correct format is vault_field=ENV_VAR. A reversed mapping silently fails to inject the expected secrets.

Verify secret fields in vault¶

vault kv get secret/myapp/config

Check that the field names in the vault output match the left side of the env-map mappings.

Protection Score Is 0¶

The cloudtaser status or wrapper logs report a protection score of 0/120 (0%).

graph TD
    A[Protection score is 0] --> B{Is the wrapper running?}
    B -->|No| B1[No score without wrapper.<br/>See Sidecar Not Injected tree]
    B -->|Yes| C{Check wrapper logs for<br/>protection score line}
    C --> D{memfd_secret unavailable?}
    D -->|Yes| D1[Upgrade to kernel 5.14+<br/>worth 15 points]
    D -->|No| E{mlock failed?}
    E -->|Yes| E1[Add CAP_IPC_LOCK capability<br/>or increase RLIMIT_MEMLOCK<br/>worth 10 points]
    E -->|No| F{eBPF agent not connected?}
    F -->|Yes| F1[Deploy cloudtaser-ebpf daemonset<br/>worth 10 points]
    F -->|No| G{ebpf_enforce_mode inactive?}
    G -->|Yes| G1[Set ebpf.enforceMode: true<br/>in Helm values<br/>worth 15 points]
    G -->|No| H[Review all checks in wrapper logs]

    style B1 fill:#7a4100,color:#fff
    style D1 fill:#2d5016,color:#fff
    style E1 fill:#2d5016,color:#fff
    style F1 fill:#2d5016,color:#fff
    style G1 fill:#2d5016,color:#fff
    style H fill:#7a4100,color:#fff

Protection score breakdown¶

The wrapper calculates a protection score (max 120) at startup. Each check logs whether it is active or missing:

Check	Points	Requirement
`memfd_secret`	15	Linux 5.14+ with `CONFIG_SECRETMEM=y`
`mlock`	10	`CAP_IPC_LOCK` and sufficient `RLIMIT_MEMLOCK`
`core_dump_exclusion`	5	`MADV_DONTDUMP` (Linux 3.4+, automatic)
`dumpable_disabled`	5	`PR_SET_DUMPABLE(0)` (automatic)
`token_protected`	10	memfd_secret or mlock available for token storage
`environ_scrubbed`	5	Wrapper scrubs `/proc/1/environ` after fork+exec (automatic on Linux)
`getenv_interposer`	10	`libcloudtaser.so` LD_PRELOAD interposer active
`ebpf_agent_connected`	10	eBPF daemonset running on the node
`cpu_mitigations`	5	CPU vulnerability mitigations enabled
`ebpf_enforce_mode`	15	eBPF enforce mode active (`ebpf.enforceMode: true`)
`ebpf_kprobes`	15	Kernel `CONFIG_BPF_KPROBE_OVERRIDE=y` for synchronous blocking
`confidential_vm`	10	AMD SEV-SNP or Intel TDX hardware memory encryption

kubectl logs <pod-name> -c <container-name> 2>&1 | grep "protection"

Each missing check includes a fix field with the remediation step.

eBPF Agent Errors¶

The cloudtaser-ebpf daemonset pods are crashing or not starting.

graph TD
    A[eBPF agent not starting] --> B{Check agent logs}
    B --> C{Kernel version < 5.8?}
    C -->|Yes| C1[Upgrade node kernel to 5.8+<br/>5.14+ recommended for full features]
    C -->|No| D{BTF not available?}
    D -->|Yes| D1[Enable BTF: check for<br/>/sys/kernel/btf/vmlinux on nodes]
    D -->|No| E{Insufficient capabilities?}
    E -->|Yes| E1[eBPF agent requires privileged mode<br/>with SYS_ADMIN SYS_PTRACE<br/>NET_ADMIN SYS_RESOURCE]
    E -->|No| F{BPF object file missing?}
    F -->|Yes| F1[Verify /opt/cloudtaser/secret_monitor.o<br/>exists in the agent container]
    F -->|No| G{Ring buffer errors?}
    G -->|Yes| G1[Restart agent:<br/>kubectl rollout restart daemonset<br/>-n cloudtaser-system cloudtaser-ebpf]
    G -->|No| H[Check for PodSecurityPolicy<br/>or GKE Autopilot restrictions]

    style C1 fill:#2d5016,color:#fff
    style D1 fill:#2d5016,color:#fff
    style E1 fill:#2d5016,color:#fff
    style F1 fill:#2d5016,color:#fff
    style G1 fill:#2d5016,color:#fff
    style H fill:#7a4100,color:#fff

Step-by-step details¶

Check kernel version¶

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'

Required: 5.8+ for basic eBPF support. Recommended: 5.14+ for memfd_secret.

Check BTF availability¶

kubectl exec -n cloudtaser-system <ebpf-pod> -- ls /sys/kernel/btf/vmlinux

GKE COS and Ubuntu nodes have BTF by default. Some custom AMIs may not.

Check capabilities¶

The eBPF agent requires privileged mode. If a PodSecurityPolicy or PodSecurityStandard blocks this, the agent cannot attach BPF programs.

kubectl describe pod -n cloudtaser-system <ebpf-pod> | grep -A5 "Security"

Check agent health endpoint¶

kubectl exec -n cloudtaser-system <ebpf-pod> -- wget -qO- http://localhost:9090/healthz

The agent publishes status at /status:

kubectl exec -n cloudtaser-system <ebpf-pod> -- wget -qO- http://localhost:9090/status

Expected: {"enforce_mode":true,"kprobes_active":false,"tracepoint_count":N,"kprobe_count":N}.

Unsupported environments

GKE Autopilot and AWS Fargate do not support the eBPF agent. Autopilot disallows privileged pods and hostPID. Fargate does not support DaemonSets. The wrapper still provides secret injection in these environments, but runtime enforcement is not available.

Vault Authentication Failing¶

The wrapper logs show authentication errors when connecting to Vault/OpenBao.

graph TD
    A[Vault authentication failing] --> B{Is VAULT_ADDR set correctly?}
    B -->|No| B1[Set cloudtaser.io/vault-address<br/>with full URL including port]
    B -->|Yes| C{Is Vault reachable?}
    C -->|No| C1[Check network: firewall rules<br/>NetworkPolicies VPC peering]
    C -->|Yes| D{Is Vault sealed?}
    D -->|Yes| D1[Unseal Vault with threshold<br/>number of unseal keys]
    D -->|No| E{Is K8s auth method enabled?}
    E -->|No| E1[Run cloudtaser connect<br/>or enable manually via vault CLI]
    E -->|Yes| F{Does the auth role exist?}
    F -->|No| F1[Create role with bound SA:<br/>vault write auth/kubernetes/role/name ...]
    F -->|Yes| G{Is the ServiceAccount bound?}
    G -->|No| G1[Update role bound_service_account_names<br/>and bound_service_account_namespaces]
    G -->|Yes| H[Check Vault audit logs for<br/>detailed auth failure reason]

    style B1 fill:#2d5016,color:#fff
    style C1 fill:#2d5016,color:#fff
    style D1 fill:#2d5016,color:#fff
    style E1 fill:#2d5016,color:#fff
    style F1 fill:#2d5016,color:#fff
    style G1 fill:#2d5016,color:#fff
    style H fill:#7a4100,color:#fff

Step-by-step details¶

Check vault health¶

kubectl exec <pod-name> -- wget -qO- https://vault.eu.example.com/v1/sys/health

Expected: {"initialized":true,"sealed":false,...}. If sealed is true, unseal vault first.

Check auth method¶

vault auth list

Expected: a kubernetes/ entry in the auth methods list.

Check role configuration¶

vault read auth/kubernetes/role/<role-name>

Verify bound_service_account_names and bound_service_account_namespaces include the pod's ServiceAccount and namespace.

Use the CLI validator¶

cloudtaser validate --vault-address https://vault.eu.example.com

This checks vault health, seal status, auth configuration, and role bindings in one step.

S3 Proxy Encryption Errors¶

The S3 proxy sidecar fails to encrypt or decrypt objects.

graph TD
    A[S3 proxy encryption errors] --> B{Check proxy logs}
    B --> C{Transit engine not mounted?}
    C -->|Yes| C1[Enable Transit engine:<br/>vault secrets enable transit]
    C -->|No| D{Transit key not found?}
    D -->|Yes| D1[Create key:<br/>vault write transit/keys/name type=aes256-gcm96]
    D -->|No| E{Permission denied on encrypt/decrypt?}
    E -->|Yes| E1[Update Vault policy to allow<br/>transit/encrypt/* and transit/decrypt/*]
    E -->|No| F{Vault unreachable from proxy?}
    F -->|Yes| F1[Check network and vault address<br/>annotation on the pod]
    F -->|No| G[Check proxy logs for Vault API errors]

    style C1 fill:#2d5016,color:#fff
    style D1 fill:#2d5016,color:#fff
    style E1 fill:#2d5016,color:#fff
    style F1 fill:#2d5016,color:#fff
    style G fill:#7a4100,color:#fff

Step-by-step details¶

Check proxy logs¶

kubectl logs <pod-name> -c cloudtaser-s3-proxy

Verify Transit engine¶

vault secrets list | grep transit

Expected: transit/ appears in the list.

Verify Transit key¶

vault read transit/keys/<key-name>

Expected: key details including type and latest_version.

Check policy¶

The Vault policy for the S3 proxy must include:

path "transit/encrypt/*" {
  capabilities = ["create", "update"]
}
path "transit/decrypt/*" {
  capabilities = ["create", "update"]
}

Getting More Help¶

If the decision trees above do not resolve your issue:

Collect component logs:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
kubectl logs <pod-name> -c <container-name>

Run the full validation:

cloudtaser validate --vault-address https://vault.eu.example.com

Run a full audit:

cloudtaser audit --vault-address https://vault.eu.example.com

See the full Troubleshooting reference for additional symptoms and causes.