Production Deployment Guide¶
This guide covers production hardening for CloudTaser deployments. It assumes you have completed the basic installation of the operator, eBPF agent, and optionally the S3 encryption proxy.
Multi-Cloud Support¶
CloudTaser is tested on the three major managed Kubernetes platforms. Each has specific requirements and considerations.
Cluster Requirements¶
| Requirement | Value |
|---|---|
| Cluster type | GKE Standard (not Autopilot) |
| Kubernetes | 1.28+ |
| Node image | Container-Optimized OS (COS) or Ubuntu |
| Kernel | 5.15+ (COS and Ubuntu both qualify) |
GKE-Specific Configuration¶
Workload Identity -- If using Workload Identity, ensure the operator's ServiceAccount is bound to a GCP service account with no additional permissions. CloudTaser does not need GCP API access; it authenticates to vault using Kubernetes auth only.
Private clusters -- The vault endpoint must be reachable from the cluster's VPC. Use VPC peering or Cloud VPN to connect to your EU vault. Add the vault endpoint to the master authorized networks if using a private control plane.
Binary Authorization -- The operator and wrapper images are signed. Configure Binary Authorization to allow images from ghcr.io/skipopsltd/*.
Cluster Requirements¶
| Requirement | Value |
|---|---|
| Node groups | Managed or self-managed (not Fargate) |
| Kubernetes | 1.28+ |
| AMI | Amazon Linux 2023 or Ubuntu 22.04 |
| Kernel | 5.15+ (AL2023: 6.1+, Ubuntu 22.04: 5.15+) |
EKS-Specific Configuration¶
IRSA / Pod Identity -- Not required for the operator or eBPF agent. CloudTaser authenticates to vault using Kubernetes auth, not AWS IAM. IRSA or Pod Identity is only needed if the S3 proxy requires access to AWS S3 buckets.
VPC connectivity -- Ensure your EU vault is reachable from the EKS VPC via VPN, Transit Gateway, or a public endpoint with TLS.
Security Groups -- Allow outbound HTTPS (port 443) from worker nodes to the vault endpoint. Also allow intra-cluster traffic on port 8199 (gRPC between eBPF agent and operator).
Cluster Requirements¶
| Requirement | Value |
|---|---|
| Node pools | Regular (not Virtual Nodes / ACI) |
| Kubernetes | 1.28+ |
| Node image | Ubuntu 22.04 or Azure Linux (Mariner) |
| Kernel | 5.15+ |
AKS-Specific Configuration¶
Azure AD Pod Identity / Workload Identity -- Not required. CloudTaser uses Kubernetes auth to vault, not Azure AD. Only needed if the S3 proxy accesses Azure Blob Storage.
Private endpoint -- If using AKS private cluster, ensure vault is reachable from the VNet via VNet peering or Azure VPN Gateway.
NSG rules -- Allow outbound HTTPS to the vault endpoint from node pool subnets. Allow intra-cluster traffic on port 8199.
Network Policies¶
CloudTaser requires specific network connectivity between its components and external services. Apply network policies to restrict traffic to only what is necessary.
Required Connectivity¶
| Source | Destination | Port | Protocol | Purpose |
|---|---|---|---|---|
| Application pods | Vault endpoint | 443 | HTTPS | Secret fetching by wrapper |
| Operator pod | K8s API server | 443 | HTTPS | Webhook serving, pod watching |
| eBPF agent | Operator pod | 8199 | gRPC | PID registration for protected processes |
| Operator pod | Container registries | 443 | HTTPS | Entrypoint resolution |
| S3 proxy sidecar | Upstream S3 endpoint | 443 | HTTPS | Object storage access (if S3 proxy enabled) |
| S3 proxy sidecar | Vault endpoint | 443 | HTTPS | Transit encrypt/decrypt operations |
Auto-Applied Policies¶
The CloudTaser operator automatically applies egress NetworkPolicies to namespaces containing protected pods. These policies restrict protected pods to only reach:
- The configured vault endpoint (HTTPS/443)
- The Kubernetes API server (for service account token exchange)
- DNS (UDP/TCP 53)
Auto-applied policies are created as cloudtaser-egress-<namespace> NetworkPolicy resources and are reconciled by the operator's NetworkPolicy controller.
Manual NetworkPolicy Example¶
For additional control, apply explicit network policies:
# Allow application pods to reach the vault
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-vault-egress
namespace: default
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: <VAULT_IP>/32
ports:
- protocol: TCP
port: 443
---
# Allow eBPF agent to reach the operator
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ebpf-to-operator
namespace: cloudtaser-system
spec:
podSelector:
matchLabels:
app: cloudtaser-ebpf
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: cloudtaser-operator
ports:
- protocol: TCP
port: 8199
Generate Policies with CLI¶
The CloudTaser CLI can generate network policies tailored to your environment:
Apply the generated policies:
RBAC Hardening¶
The Helm chart creates the necessary RBAC resources automatically. This section covers additional hardening for production environments.
Operator ClusterRole¶
The operator requires cluster-wide permissions:
| Resource | Verbs | Purpose |
|---|---|---|
api.cloudtaser.io CRDs |
Full CRUD | Manage CloudTaserConfigs and SecretMappings |
secrets |
get, list, watch, create, update | Webhook TLS certificates |
pods |
get, list, watch | Injection decisions |
serviceaccounts |
get, list, watch | Identity validation |
mutatingwebhookconfigurations |
get, patch | Self-managed webhook |
apps/deployments |
get, list, watch, patch | Workload management |
eBPF Agent ClusterRole¶
The eBPF agent requires minimal permissions:
| Resource | Verbs | Purpose |
|---|---|---|
pods |
get, list, watch | Discover monitored PIDs |
nodes |
get | Identify the current node |
The agent runs as a privileged DaemonSet with hostPID: true and requires SYS_ADMIN, SYS_PTRACE, NET_ADMIN, and SYS_RESOURCE capabilities.
Restrict Pod Read Access¶
CloudTaser stores configuration in pod annotations (vault paths, environment variable mappings, rotation strategy) but never stores secret values in Kubernetes. However, annotation metadata reveals your secret infrastructure -- vault paths, role names, and key mappings. Restricting pod read access is a defense-in-depth measure.
What annotations expose (and what they do not)
Annotations contain only configuration: vault endpoint URLs, auth role names, KV paths, and env-var mappings. They tell an observer where secrets live, but not what the secret values are. An attacker with only pod read access cannot retrieve actual credentials.
Audit existing RBAC:
# Check if the default service account can list pods
kubectl auth can-i list pods --as=system:serviceaccount:default:default
# Cluster-wide audit
kubectl auth can-i list pods --all-namespaces \
--as=system:serviceaccount:default:default
If any non-operator service account returns yes, apply restrictive RBAC:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: protected-workloads
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-reader-binding
namespace: protected-workloads
subjects:
- kind: ServiceAccount
name: cloudtaser-operator
namespace: cloudtaser-system
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Use Separate Namespaces for Protected Workloads¶
Isolate CloudTaser-protected workloads in dedicated namespaces:
kubectl create namespace protected-workloads
kubectl label namespace protected-workloads cloudtaser.io/protected=true
Benefits:
- RBAC
RoleandRoleBindingare namespace-scoped, simplifying access control - NetworkPolicies (auto-applied by the operator) are namespace-scoped
- Audit logging can be filtered by namespace
Resource Limits¶
Configure appropriate resource requests and limits for all CloudTaser components in production.
Operator¶
| Resource | Request | Limit | Notes |
|---|---|---|---|
| CPU | 50m | 200m | Increases during high pod creation rates |
| Memory | 64Mi | 128Mi | Stable; cache size depends on watched resources |
eBPF Agent (per node)¶
| Resource | Request | Limit | Notes |
|---|---|---|---|
| CPU | 100m | 500m | Higher during initial BPF program loading |
| Memory | 128Mi | 512Mi | BPF maps consume memory proportional to monitored PIDs |
Wrapper (per injected pod)¶
The wrapper runs inside each protected workload container and adds minimal overhead:
| Resource | Overhead |
|---|---|
| Memory | ~5-10 MB additional RSS |
| CPU | Negligible (idle after initial vault fetch; wakes for lease renewal) |
| Startup latency | 50-200ms (depends on vault response time) |
S3 Proxy (per injected pod)¶
| Resource | Request | Limit | Notes |
|---|---|---|---|
| CPU | 50m | 200m | Higher during encryption-heavy workloads |
| Memory | 32Mi | 128Mi | Scales with concurrent request count |
Override defaults in your Helm values:
operator:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
ebpf:
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
High Availability¶
Operator HA¶
For production, run the operator with multiple replicas and leader election:
In HA mode:
- 3 replicas are deployed with pod anti-affinity across nodes
- Leader election ensures only one replica serves the webhook at a time
- Failover is automatic -- if the leader pod is evicted or crashes, another replica takes over within seconds
- Replicas are spread across availability zones when possible
Pod Disruption Budget
The Helm chart creates a PodDisruptionBudget in HA mode that ensures at least 1 replica is always available during voluntary disruptions (node drains, upgrades).
eBPF Agent HA¶
The eBPF agent runs as a DaemonSet and is inherently HA -- one instance per node. It uses priorityClassName: system-node-critical to ensure scheduling even under resource pressure.
Node drain considerations
When draining a node for maintenance, the eBPF agent on that node will be evicted. Pods on that node lose runtime enforcement until the agent is rescheduled. Plan maintenance windows accordingly and drain nodes one at a time.
Vault HA¶
Vault HA is outside the scope of CloudTaser but is strongly recommended for production:
- OpenBao / Vault Enterprise -- Use integrated Raft storage with 3+ nodes across availability zones
- OpenBao OSS -- Use an external storage backend (Consul, PostgreSQL) with multiple vault instances behind a load balancer
Monitoring and Alerting¶
Operator Metrics¶
The operator exposes Prometheus metrics on port 8080:
| Metric | Type | Description |
|---|---|---|
controller_runtime_reconcile_total |
Counter | Reconciliation counts by controller and result |
controller_runtime_reconcile_errors_total |
Counter | Failed reconciliations |
cloudtaser_webhook_injection_total |
Counter | Injection count by status (success, error, skipped) |
Example Prometheus scrape config:
- job_name: cloudtaser-operator
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- cloudtaser-system
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: cloudtaser-operator
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: "8080"
action: keep
eBPF Agent Health¶
The eBPF agent exposes HTTP health endpoints on port 9090:
| Endpoint | Purpose |
|---|---|
GET /healthz |
Liveness probe -- agent process is running |
GET /readyz |
Readiness probe -- BPF programs are loaded and monitoring |
Recommended Alerts¶
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cloudtaser-alerts
namespace: cloudtaser-system
spec:
groups:
- name: cloudtaser
rules:
- alert: CloudTaserOperatorDown
expr: |
kube_deployment_status_replicas_available{
deployment="cloudtaser-operator",
namespace="cloudtaser-system"
} == 0
for: 1m
labels:
severity: critical
annotations:
summary: CloudTaser operator has no available replicas
- alert: CloudTaserEbpfAgentMissing
expr: |
kube_daemonset_status_number_ready{
daemonset="cloudtaser-ebpf",
namespace="cloudtaser-system"
}
<
kube_daemonset_status_desired_number_scheduled{
daemonset="cloudtaser-ebpf",
namespace="cloudtaser-system"
}
for: 5m
labels:
severity: critical
annotations:
summary: eBPF agent not running on all nodes
- alert: CloudTaserWebhookErrors
expr: |
rate(cloudtaser_webhook_injection_total{status="error"}[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: CloudTaser webhook injection errors detected
- alert: CloudTaserLowProtectionScore
expr: |
cloudtaser_workload_protection_score < 50
for: 10m
labels:
severity: warning
annotations:
summary: Workload protection score below threshold
Grafana Dashboard¶
Key panels for a CloudTaser monitoring dashboard:
- Operator health -- Replica count, restart count, reconciliation rate
- Injection rate -- Successful vs. failed injections over time
- eBPF coverage -- Nodes with healthy agent / total nodes
- Protection scores -- Per-workload protection score heatmap
- Vault latency -- P50/P95/P99 secret fetch latency from wrapper metrics
- S3 proxy throughput -- Encrypted objects per second, encryption latency
TLS Certificate Management¶
Webhook TLS¶
The operator generates a self-signed CA and server certificate at startup. The CA bundle is injected into the MutatingWebhookConfiguration automatically. Certificates are stored in an emptyDir volume and regenerated on pod restart.
For production, provide your own certificates via a Kubernetes Secret:
The secret must contain tls.crt and tls.key:
kubectl create secret tls cloudtaser-webhook-certs \
--cert=webhook.crt \
--key=webhook.key \
--namespace cloudtaser-system
cert-manager integration
If you use cert-manager, create a Certificate resource that targets the webhook service:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: cloudtaser-webhook
namespace: cloudtaser-system
spec:
secretName: cloudtaser-webhook-certs
dnsNames:
- cloudtaser-operator.cloudtaser-system.svc
- cloudtaser-operator.cloudtaser-system.svc.cluster.local
issuerRef:
name: cluster-issuer
kind: ClusterIssuer
Vault TLS¶
The wrapper validates the vault server certificate on every connection. If your vault uses a private CA, mount the CA bundle into workload pods:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
metadata:
annotations:
cloudtaser.io/inject: "true"
cloudtaser.io/vault-address: "https://vault.eu.example.com"
cloudtaser.io/vault-role: "cloudtaser"
cloudtaser.io/secret-paths: "secret/data/myapp/config"
cloudtaser.io/env-map: "password=PGPASSWORD"
spec:
containers:
- name: myapp
image: myapp:latest
volumeMounts:
- name: vault-ca
mountPath: /etc/ssl/certs/vault-ca.crt
subPath: ca.crt
volumes:
- name: vault-ca
configMap:
name: vault-ca-bundle
Create the ConfigMap containing the CA certificate:
Production Checklist¶
Use this checklist before going live with CloudTaser in production.
Pre-production checklist
Infrastructure
- [ ] Vault hosted in EU region with TLS enabled
- [ ] Vault HA configured (3+ nodes with Raft or external storage)
- [ ] Network connectivity verified between cluster and vault
- [ ] Kubernetes cluster running 1.28+ with kernel 5.14+
Operator
- [ ] HA mode enabled (
operator.ha: true,replicaCount: 3) - [ ] Leader election enabled (
operator.leaderElect: true) - [ ] Resource limits configured appropriately for workload volume
- [ ] Webhook
failurePolicy: Fail(default; do not change toIgnorein production) - [ ] Webhook TLS certificates managed (self-signed or cert-manager)
eBPF Agent
- [ ] DaemonSet running on all nodes (
DESIRED==READY) - [ ]
enforceMode: true(not audit-only) - [ ]
reactiveKill: truefor high-security workloads - [ ] All nodes running kernel 5.14+ for full feature set
Security
- [ ] NetworkPolicies applied (auto or manual)
- [ ] RBAC hardened -- pod read access restricted to operators only
- [ ] Protected workloads in dedicated namespaces
- [ ] Vault audit logging enabled
Monitoring
- [ ] Prometheus scraping operator metrics (port 8080)
- [ ] eBPF agent health endpoints monitored (port 9090)
- [ ] Alerting rules configured for operator down, eBPF missing, webhook errors
- [ ] Protection score monitoring active
Validation
- [ ]
cloudtaser validatepasses all checks - [ ]
cloudtaser auditshows expected coverage - [ ] Test secret injection with a sample workload before rolling out to production services
Next Steps¶
- Security Model -- Understand the full threat model and trust boundaries
- Protection Score -- How workload protection scores are calculated
- Key Rotation -- Configure and manage secret rotation
- Troubleshooting -- Diagnose and resolve common issues
- Compliance Mapping -- GDPR, NIS2, DORA, and Schrems II alignment