Production Deployment Guide¶

This guide covers production hardening for CloudTaser deployments. It assumes you have completed the basic installation of the operator, eBPF agent, and optionally the S3 encryption proxy.

Multi-Cloud Support¶

CloudTaser is tested on the three major managed Kubernetes platforms. Each has specific requirements and considerations.

GKE (Google Kubernetes Engine)EKS (Elastic Kubernetes Service)AKS (Azure Kubernetes Service)

Cluster Requirements¶

Requirement	Value
Cluster type	GKE Standard (not Autopilot)
Kubernetes	1.28+
Node image	Container-Optimized OS (COS) or Ubuntu
Kernel	5.15+ (COS and Ubuntu both qualify)

GKE-Specific Configuration¶

Workload Identity -- If using Workload Identity, ensure the operator's ServiceAccount is bound to a GCP service account with no additional permissions. CloudTaser does not need GCP API access; it authenticates to vault using Kubernetes auth only.

Private clusters -- The vault endpoint must be reachable from the cluster's VPC. Use VPC peering or Cloud VPN to connect to your EU vault. Add the vault endpoint to the master authorized networks if using a private control plane.

Binary Authorization -- The operator and wrapper images are signed. Configure Binary Authorization to allow images from ghcr.io/skipopsltd/*.

helm install cloudtaser oci://ghcr.io/skipopsltd/cloudtaser-helm/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.vaultAddress=https://vault.eu.example.com

Cluster Requirements¶

Requirement	Value
Node groups	Managed or self-managed (not Fargate)
Kubernetes	1.28+
AMI	Amazon Linux 2023 or Ubuntu 22.04
Kernel	5.15+ (AL2023: 6.1+, Ubuntu 22.04: 5.15+)

EKS-Specific Configuration¶

IRSA / Pod Identity -- Not required for the operator or eBPF agent. CloudTaser authenticates to vault using Kubernetes auth, not AWS IAM. IRSA or Pod Identity is only needed if the S3 proxy requires access to AWS S3 buckets.

VPC connectivity -- Ensure your EU vault is reachable from the EKS VPC via VPN, Transit Gateway, or a public endpoint with TLS.

Security Groups -- Allow outbound HTTPS (port 443) from worker nodes to the vault endpoint. Also allow intra-cluster traffic on port 8199 (gRPC between eBPF agent and operator).

helm install cloudtaser oci://ghcr.io/skipopsltd/cloudtaser-helm/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.vaultAddress=https://vault.eu.example.com

Cluster Requirements¶

Requirement	Value
Node pools	Regular (not Virtual Nodes / ACI)
Kubernetes	1.28+
Node image	Ubuntu 22.04 or Azure Linux (Mariner)
Kernel	5.15+

AKS-Specific Configuration¶

Azure AD Pod Identity / Workload Identity -- Not required. CloudTaser uses Kubernetes auth to vault, not Azure AD. Only needed if the S3 proxy accesses Azure Blob Storage.

Private endpoint -- If using AKS private cluster, ensure vault is reachable from the VNet via VNet peering or Azure VPN Gateway.

NSG rules -- Allow outbound HTTPS to the vault endpoint from node pool subnets. Allow intra-cluster traffic on port 8199.

helm install cloudtaser oci://ghcr.io/skipopsltd/cloudtaser-helm/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.vaultAddress=https://vault.eu.example.com

Network Policies¶

CloudTaser requires specific network connectivity between its components and external services. Apply network policies to restrict traffic to only what is necessary.

Required Connectivity¶

Source	Destination	Port	Protocol	Purpose
Application pods	Vault endpoint	443	HTTPS	Secret fetching by wrapper
Operator pod	K8s API server	443	HTTPS	Webhook serving, pod watching
eBPF agent	Operator pod	8199	gRPC	PID registration for protected processes
Operator pod	Container registries	443	HTTPS	Entrypoint resolution
S3 proxy sidecar	Upstream S3 endpoint	443	HTTPS	Object storage access (if S3 proxy enabled)
S3 proxy sidecar	Vault endpoint	443	HTTPS	Transit encrypt/decrypt operations

Auto-Applied Policies¶

The CloudTaser operator automatically applies egress NetworkPolicies to namespaces containing protected pods. These policies restrict protected pods to only reach:

The configured vault endpoint (HTTPS/443)
The Kubernetes API server (for service account token exchange)
DNS (UDP/TCP 53)

Auto-applied policies are created as cloudtaser-egress-<namespace> NetworkPolicy resources and are reconciled by the operator's NetworkPolicy controller.

Manual NetworkPolicy Example¶

For additional control, apply explicit network policies:

cloudtaser-network-policies.yaml

# Allow application pods to reach the vault
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-vault-egress
  namespace: default
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: <VAULT_IP>/32
      ports:
        - protocol: TCP
          port: 443
---
# Allow eBPF agent to reach the operator
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ebpf-to-operator
  namespace: cloudtaser-system
spec:
  podSelector:
    matchLabels:
      app: cloudtaser-ebpf
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: cloudtaser-operator
      ports:
        - protocol: TCP
          port: 8199

Generate Policies with CLI¶

The CloudTaser CLI can generate network policies tailored to your environment:

cloudtaser netpol --vault-address https://vault.eu.example.com

Apply the generated policies:

cloudtaser netpol --vault-address https://vault.eu.example.com | kubectl apply -f -

RBAC Hardening¶

The Helm chart creates the necessary RBAC resources automatically. This section covers additional hardening for production environments.

Operator ClusterRole¶

The operator requires cluster-wide permissions:

Resource	Verbs	Purpose
`api.cloudtaser.io` CRDs	Full CRUD	Manage CloudTaserConfigs and SecretMappings
`secrets`	get, list, watch, create, update	Webhook TLS certificates
`pods`	get, list, watch	Injection decisions
`serviceaccounts`	get, list, watch	Identity validation
`mutatingwebhookconfigurations`	get, patch	Self-managed webhook
`apps/deployments`	get, list, watch, patch	Workload management

eBPF Agent ClusterRole¶

The eBPF agent requires minimal permissions:

Resource	Verbs	Purpose
`pods`	get, list, watch	Discover monitored PIDs
`nodes`	get	Identify the current node

The agent runs as a privileged DaemonSet with hostPID: true and requires SYS_ADMIN, SYS_PTRACE, NET_ADMIN, and SYS_RESOURCE capabilities.

Restrict Pod Read Access¶

CloudTaser stores configuration in pod annotations (vault paths, environment variable mappings, rotation strategy) but never stores secret values in Kubernetes. However, annotation metadata reveals your secret infrastructure -- vault paths, role names, and key mappings. Restricting pod read access is a defense-in-depth measure.

What annotations expose (and what they do not)

Annotations contain only configuration: vault endpoint URLs, auth role names, KV paths, and env-var mappings. They tell an observer where secrets live, but not what the secret values are. An attacker with only pod read access cannot retrieve actual credentials.

Audit existing RBAC:

# Check if the default service account can list pods
kubectl auth can-i list pods --as=system:serviceaccount:default:default

# Cluster-wide audit
kubectl auth can-i list pods --all-namespaces \
  --as=system:serviceaccount:default:default

If any non-operator service account returns yes, apply restrictive RBAC:

rbac-pod-reader.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: protected-workloads
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-reader-binding
  namespace: protected-workloads
subjects:
  - kind: ServiceAccount
    name: cloudtaser-operator
    namespace: cloudtaser-system
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Use Separate Namespaces for Protected Workloads¶

Isolate CloudTaser-protected workloads in dedicated namespaces:

kubectl create namespace protected-workloads
kubectl label namespace protected-workloads cloudtaser.io/protected=true

Benefits:

RBAC Role and RoleBinding are namespace-scoped, simplifying access control
NetworkPolicies (auto-applied by the operator) are namespace-scoped
Audit logging can be filtered by namespace

Resource Limits¶

Configure appropriate resource requests and limits for all CloudTaser components in production.

Operator¶

Resource	Request	Limit	Notes
CPU	50m	200m	Increases during high pod creation rates
Memory	64Mi	128Mi	Stable; cache size depends on watched resources

eBPF Agent (per node)¶

Resource	Request	Limit	Notes
CPU	100m	500m	Higher during initial BPF program loading
Memory	128Mi	512Mi	BPF maps consume memory proportional to monitored PIDs

Wrapper (per injected pod)¶

The wrapper runs inside each protected workload container and adds minimal overhead:

Resource	Overhead
Memory	~5-10 MB additional RSS
CPU	Negligible (idle after initial vault fetch; wakes for lease renewal)
Startup latency	50-200ms (depends on vault response time)

S3 Proxy (per injected pod)¶

Resource	Request	Limit	Notes
CPU	50m	200m	Higher during encryption-heavy workloads
Memory	32Mi	128Mi	Scales with concurrent request count

Override defaults in your Helm values:

values.yaml

operator:
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 256Mi

ebpf:
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 1000m
      memory: 1Gi

High Availability¶

Operator HA¶

For production, run the operator with multiple replicas and leader election:

values.yaml

operator:
  ha: true
  replicaCount: 3
  leaderElect: true

In HA mode:

3 replicas are deployed with pod anti-affinity across nodes
Leader election ensures only one replica serves the webhook at a time
Failover is automatic -- if the leader pod is evicted or crashes, another replica takes over within seconds
Replicas are spread across availability zones when possible

Pod Disruption Budget

The Helm chart creates a PodDisruptionBudget in HA mode that ensures at least 1 replica is always available during voluntary disruptions (node drains, upgrades).

eBPF Agent HA¶

The eBPF agent runs as a DaemonSet and is inherently HA -- one instance per node. It uses priorityClassName: system-node-critical to ensure scheduling even under resource pressure.

Node drain considerations

When draining a node for maintenance, the eBPF agent on that node will be evicted. Pods on that node lose runtime enforcement until the agent is rescheduled. Plan maintenance windows accordingly and drain nodes one at a time.

Vault HA¶

Vault HA is outside the scope of CloudTaser but is strongly recommended for production:

OpenBao / Vault Enterprise -- Use integrated Raft storage with 3+ nodes across availability zones
OpenBao OSS -- Use an external storage backend (Consul, PostgreSQL) with multiple vault instances behind a load balancer

Monitoring and Alerting¶

Operator Metrics¶

The operator exposes Prometheus metrics on port 8080:

Metric	Type	Description
`controller_runtime_reconcile_total`	Counter	Reconciliation counts by controller and result
`controller_runtime_reconcile_errors_total`	Counter	Failed reconciliations
`cloudtaser_webhook_injection_total`	Counter	Injection count by status (success, error, skipped)

Example Prometheus scrape config:

prometheus-scrape.yaml

- job_name: cloudtaser-operator
  kubernetes_sd_configs:
    - role: pod
      namespaces:
        names:
          - cloudtaser-system
  relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      regex: cloudtaser-operator
      action: keep
    - source_labels: [__meta_kubernetes_pod_container_port_number]
      regex: "8080"
      action: keep

eBPF Agent Health¶

The eBPF agent exposes HTTP health endpoints on port 9090:

Endpoint	Purpose
`GET /healthz`	Liveness probe -- agent process is running
`GET /readyz`	Readiness probe -- BPF programs are loaded and monitoring

Recommended Alerts¶

cloudtaser-alerts.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cloudtaser-alerts
  namespace: cloudtaser-system
spec:
  groups:
    - name: cloudtaser
      rules:
        - alert: CloudTaserOperatorDown
          expr: |
            kube_deployment_status_replicas_available{
              deployment="cloudtaser-operator",
              namespace="cloudtaser-system"
            } == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: CloudTaser operator has no available replicas

        - alert: CloudTaserEbpfAgentMissing
          expr: |
            kube_daemonset_status_number_ready{
              daemonset="cloudtaser-ebpf",
              namespace="cloudtaser-system"
            }
            <
            kube_daemonset_status_desired_number_scheduled{
              daemonset="cloudtaser-ebpf",
              namespace="cloudtaser-system"
            }
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: eBPF agent not running on all nodes

        - alert: CloudTaserWebhookErrors
          expr: |
            rate(cloudtaser_webhook_injection_total{status="error"}[5m]) > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: CloudTaser webhook injection errors detected

        - alert: CloudTaserLowProtectionScore
          expr: |
            cloudtaser_workload_protection_score < 50
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: Workload protection score below threshold

Grafana Dashboard¶

Key panels for a CloudTaser monitoring dashboard:

Operator health -- Replica count, restart count, reconciliation rate
Injection rate -- Successful vs. failed injections over time
eBPF coverage -- Nodes with healthy agent / total nodes
Protection scores -- Per-workload protection score heatmap
Vault latency -- P50/P95/P99 secret fetch latency from wrapper metrics
S3 proxy throughput -- Encrypted objects per second, encryption latency

TLS Certificate Management¶

Webhook TLS¶

The operator generates a self-signed CA and server certificate at startup. The CA bundle is injected into the MutatingWebhookConfiguration automatically. Certificates are stored in an emptyDir volume and regenerated on pod restart.

For production, provide your own certificates via a Kubernetes Secret:

values.yaml

operator:
  webhook:
    certSecret: cloudtaser-webhook-certs

The secret must contain tls.crt and tls.key:

kubectl create secret tls cloudtaser-webhook-certs \
  --cert=webhook.crt \
  --key=webhook.key \
  --namespace cloudtaser-system

cert-manager integration

If you use cert-manager, create a Certificate resource that targets the webhook service:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cloudtaser-webhook
  namespace: cloudtaser-system
spec:
  secretName: cloudtaser-webhook-certs
  dnsNames:
    - cloudtaser-operator.cloudtaser-system.svc
    - cloudtaser-operator.cloudtaser-system.svc.cluster.local
  issuerRef:
    name: cluster-issuer
    kind: ClusterIssuer

Vault TLS¶

The wrapper validates the vault server certificate on every connection. If your vault uses a private CA, mount the CA bundle into workload pods:

deployment-with-vault-ca.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    metadata:
      annotations:
        cloudtaser.io/inject: "true"
        cloudtaser.io/vault-address: "https://vault.eu.example.com"
        cloudtaser.io/vault-role: "cloudtaser"
        cloudtaser.io/secret-paths: "secret/data/myapp/config"
        cloudtaser.io/env-map: "password=PGPASSWORD"
    spec:
      containers:
        - name: myapp
          image: myapp:latest
          volumeMounts:
            - name: vault-ca
              mountPath: /etc/ssl/certs/vault-ca.crt
              subPath: ca.crt
      volumes:
        - name: vault-ca
          configMap:
            name: vault-ca-bundle

Create the ConfigMap containing the CA certificate:

kubectl create configmap vault-ca-bundle \
  --from-file=ca.crt=vault-ca.pem \
  --namespace default

Production Checklist¶

Use this checklist before going live with CloudTaser in production.

Pre-production checklist

Infrastructure

[ ] Vault hosted in EU region with TLS enabled
[ ] Vault HA configured (3+ nodes with Raft or external storage)
[ ] Network connectivity verified between cluster and vault
[ ] Kubernetes cluster running 1.28+ with kernel 5.14+

Operator

[ ] HA mode enabled (operator.ha: true, replicaCount: 3)
[ ] Leader election enabled (operator.leaderElect: true)
[ ] Resource limits configured appropriately for workload volume
[ ] Webhook failurePolicy: Fail (default; do not change to Ignore in production)
[ ] Webhook TLS certificates managed (self-signed or cert-manager)

eBPF Agent

[ ] DaemonSet running on all nodes (DESIRED == READY)
[ ] enforceMode: true (not audit-only)
[ ] reactiveKill: true for high-security workloads
[ ] All nodes running kernel 5.14+ for full feature set

Security

[ ] NetworkPolicies applied (auto or manual)
[ ] RBAC hardened -- pod read access restricted to operators only
[ ] Protected workloads in dedicated namespaces
[ ] Vault audit logging enabled

Monitoring

[ ] Prometheus scraping operator metrics (port 8080)
[ ] eBPF agent health endpoints monitored (port 9090)
[ ] Alerting rules configured for operator down, eBPF missing, webhook errors
[ ] Protection score monitoring active

Validation

[ ] cloudtaser validate passes all checks
[ ] cloudtaser audit shows expected coverage
[ ] Test secret injection with a sample workload before rolling out to production services

Next Steps¶

Security Model -- Understand the full threat model and trust boundaries
Protection Score -- How workload protection scores are calculated
Key Rotation -- Configure and manage secret rotation
Troubleshooting -- Diagnose and resolve common issues
Compliance Mapping -- GDPR, NIS2, DORA, and Schrems II alignment