Skip to content

Upgrade and Rollback Procedures

How to upgrade CloudTaser components with zero downtime and roll back if something goes wrong.


Prerequisites

  • helm v3.x installed
  • kubectl access to the target cluster
  • The CloudTaser Helm chart repository configured:
helm repo add cloudtaser oci://ghcr.io/cloudtaser/charts
helm repo update

Upgrade Strategy

CloudTaser consists of three independently-versioned components deployed by a single Helm chart:

Component Chart Value Current Default
Operator operator.image.tag v0.5.14-amd64
Wrapper wrapper.image.tag v0.0.31-amd64
eBPF Agent ebpf.image.tag v0.1.21-amd64
S3 Proxy s3proxy.image.tag v0.2.7-amd64

The Helm chart version (currently 0.4.34) tracks independently from component versions. Upgrading the chart may update one or more component images.


Zero-Downtime Upgrade

Step 1: Check current versions

helm list -n cloudtaser-system
kubectl get deployment -n cloudtaser-system cloudtaser-operator \
  -o jsonpath='{.spec.template.spec.containers[0].image}'
kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

Step 2: Review the new chart version

# Check available chart versions
helm search repo cloudtaser --versions

# Diff the values between current and new chart
helm diff upgrade cloudtaser cloudtaser/cloudtaser \
  -n cloudtaser-system \
  -f values.yaml

Install the helm-diff plugin

helm plugin install https://github.com/databus23/helm-diff -- shows the exact Kubernetes resource diffs before applying.

Step 3: Upgrade the Helm release

helm upgrade cloudtaser cloudtaser/cloudtaser \
  -n cloudtaser-system \
  -f values.yaml \
  --version <target-chart-version>

Step 4: Verify the upgrade

# Check operator is running
kubectl rollout status deployment/cloudtaser-operator -n cloudtaser-system

# Check eBPF daemonset is running on all nodes
kubectl rollout status daemonset/cloudtaser-ebpf -n cloudtaser-system

# Verify webhook is responding
kubectl get mutatingwebhookconfiguration cloudtaser-operator-webhook -o yaml | head -20

Component Upgrade Order

When upgrading individual components (setting specific image tags), follow this order:

  1. eBPF agent first -- the agent is backward compatible with older wrapper versions. Upgrading it first ensures enforcement is active during the transition.
  2. Operator second -- the operator generates injection patches. A new operator version may inject new annotations or environment variables that the wrapper needs.
  3. Wrapper last -- wrapper upgrades require pod restarts (the wrapper binary is copied via the init container). The new wrapper image is used on the next pod creation.

Wrapper upgrades require pod restarts

The wrapper binary is copied into each pod's emptyDir volume by an init container at pod creation time. Existing running pods continue using the old wrapper binary until they are restarted. To roll out a new wrapper version:

# Restart all deployments with CloudTaser injection
kubectl get deployments --all-namespaces \
  -o jsonpath='{range .items[?(@.spec.template.metadata.annotations.cloudtaser\.io/inject=="true")]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}' | \
  while read dep; do
    kubectl rollout restart deployment "${dep##*/}" -n "${dep%%/*}"
  done

CRD Upgrades

The Helm chart includes CRDs for CloudTaserConfig and SecretMapping. Helm does not upgrade CRDs automatically after initial install.

To upgrade CRDs:

# Apply CRDs from the new chart version
kubectl apply -f https://raw.githubusercontent.com/cloudtaser/cloudtaser-helm/main/charts/cloudtaser/crds/api.cloudtaser.io_cloudtaserconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/cloudtaser/cloudtaser-helm/main/charts/cloudtaser/crds/api.cloudtaser.io_secretmappings.yaml

Apply CRDs before upgrading the Helm release

If the new operator version references new CRD fields, apply the CRDs first. Otherwise the operator may fail to start because the API server rejects unknown fields.


Certificate Rotation During Upgrade

The operator manages its own webhook TLS certificates. During an upgrade:

  • Certificates are stored in a Kubernetes Secret (cloudtaser-operator-certs). All operator replicas share the same certificate.
  • The operator checks certificate expiry every 24 hours and rotates certificates 30 days before expiry.
  • On startup, the operator patches the MutatingWebhookConfiguration and ValidatingWebhookConfiguration with the current CA bundle.
  • Broker mTLS certificates (stored in cloudtaser-broker-tls) follow the same rotation schedule.

No manual certificate action is needed during upgrades. The new operator pod reads the existing certificate Secret and patches the webhook configurations automatically.


Rollback

Helm rollback

# List revision history
helm history cloudtaser -n cloudtaser-system

# Roll back to the previous revision
helm rollback cloudtaser -n cloudtaser-system

# Roll back to a specific revision
helm rollback cloudtaser <revision-number> -n cloudtaser-system

Verify rollback

kubectl rollout status deployment/cloudtaser-operator -n cloudtaser-system
kubectl rollout status daemonset/cloudtaser-ebpf -n cloudtaser-system

Rollback considerations

Scenario Impact Action
Operator rollback New pods will be injected with the previous operator logic. Existing pods are unaffected. Restart affected deployments if needed
eBPF rollback Previous enforcement behavior is restored on all nodes No pod restart needed
Wrapper rollback Running pods keep the current wrapper. New pods get the old wrapper. Restart deployments to pick up old wrapper binary
CRD rollback CRDs are not managed by Helm rollback Manually reapply old CRD versions if fields were removed

CRDs are not rolled back by Helm

helm rollback does not revert CRD changes. If a CRD was updated with new fields, rolling back the chart leaves the new CRD in place. This is generally safe because CRDs are additive, but verify compatibility.


Database Migration Considerations

SaaS Control Plane

The CloudTaser SaaS control plane (cloudtaser-saas) uses an in-memory tenant store. There are no database migrations required when upgrading the SaaS component.

Database Proxy

The database proxy (cloudtaser-db-proxy) does not have its own database. It proxies PostgreSQL connections and performs transparent encryption/decryption. Upgrading the proxy is safe because:

  • The encrypted value format is versioned (currently version 1). New proxy versions can always read values encrypted by older versions.
  • The encryption key lives in Vault Transit, not in the proxy. Key continuity is guaranteed by the vault.
  • The proxy is stateless. Restart it at any time.

However, if a new proxy version changes the encryption format:

  1. The new format version byte ensures old values remain readable
  2. New writes use the new format
  3. Rollback to an older proxy that does not understand the new format will fail to decrypt values written by the new version

Test proxy upgrades in staging first

Deploy the new proxy version in a staging environment and verify both reads (of existing encrypted data) and writes work correctly before upgrading production.


Canary Upgrade

For large clusters, consider a canary upgrade using namespace selectors:

Step 1: Deploy the new operator version to a canary namespace

Override the webhook namespace selector to target only the canary namespace:

# canary-values.yaml
operator:
  image:
    tag: "v0.6.0-amd64"
wrapper:
  image:
    tag: "v0.0.32-amd64"

Step 2: Restart a test workload in the canary namespace

kubectl rollout restart deployment/test-app -n canary

Step 3: Verify protection score

cloudtaser status -n canary
cloudtaser audit --vault-address https://vault.eu.example.com -n canary

Step 4: Roll out to all namespaces

Once verified, upgrade the Helm release for the full cluster.


Emergency: Disable Injection

If an upgrade causes pod creation failures, disable the webhook temporarily:

# Option 1: Set failurePolicy to Ignore (pods start without injection)
kubectl patch mutatingwebhookconfiguration cloudtaser-operator-webhook \
  --type='json' \
  -p='[{"op":"replace","path":"/webhooks/0/failurePolicy","value":"Ignore"}]'

# Option 2: Delete the webhook configuration entirely (emergency only)
kubectl delete mutatingwebhookconfiguration cloudtaser-operator-webhook

Disabling the webhook removes protection

Setting failurePolicy: Ignore means new pods start without CloudTaser injection. Existing running pods continue to operate normally. Restore the webhook after resolving the issue.

After resolving the issue, restore the webhook:

helm upgrade cloudtaser cloudtaser/cloudtaser -n cloudtaser-system -f values.yaml