Performance and Benchmarks¶

A reasonable security reviewer's next question after "does it protect my secrets?" is "what does it cost?" This page answers that honestly. We do not yet publish formal, versioned benchmarks across every workload shape -- we will, and the roadmap below names the tracker -- but we do know where the overhead lives, how big it is in order-of-magnitude terms, and what you can measure on your own today.

No invented numbers

This page deliberately avoids numeric claims (e.g. "X% throughput overhead") beyond what can be stated qualitatively and verified independently. Anything more precise will be published alongside a reproducible harness, the raw data, and the hardware profile.

Where overhead actually lives¶

Overhead from CloudTaser comes from five independent places. They compose additively; each is either a one-time cost (pod start) or a steady-state cost (per-request).

1. Pod start -- wrapper bootstrap + secret fetch¶

The wrapper binary starts as PID 1 inside the container. On first start it opens an mTLS connection to the bridge (via the broker and beacon if you're using the P2P path), authenticates, fetches its scoped secrets, writes them into memfd_secret pages, and execs your application. Order of magnitude: ~1-2 seconds additional pod-start time versus a naked exec.

Steady state -- once the pod is running and secrets are in memfd -- there is no ongoing wrapper cost.

2. eBPF agent attach/detach¶

When the eBPF daemonset starts (or when a new node joins), kprobe/tracepoint programs are compiled and attached to the relevant kernel functions. Order of magnitude: <5 ms per attach at daemonset start. This happens once per node, not per pod.

Steady-state cost for common syscalls is not measurable on a typical workload: the probe executes fully in-kernel, the inline filter checks a handful of fields against a BPF map, and the hook returns. We have never observed a workload where kprobe overhead dominated syscall latency.

3. LD_PRELOAD `getenv()` interposer¶

The wrapper injects an LD_PRELOAD shim so that getenv("API_KEY") returns a pointer into memfd_secret instead of reading /proc/self/environ. The shim adds ~100 ns per getenv() call (a map lookup plus a compare). This is only noticeable if your application calls getenv() in a hot path, which is rare -- most applications cache environment lookups at startup.

If you care about this: strace -c -e getenv on your workload for a representative window tells you the call count.

4. S3 proxy encrypt/decrypt¶

The S3 proxy sits in the client-to-bucket path and wraps every PUT/GET in AES-GCM. AES-GCM is AES-NI-accelerated on any x86_64 CPU from the last decade, so the encryption itself is effectively free at typical cloud object throughput.

Order of magnitude:

Large objects (> ~64 KiB): throughput is dominated by network + S3 backend, not AES-GCM. Expect effective throughput to match direct-to-S3 within noise.
Small objects: the proxy hop itself (one TLS connection, one HMAC round) adds ~2-5 ms latency per request. For high-fan-out small-object workloads (sessions, feature flags, thumbnails), that's the dimension to measure.

5. Beacon relay path¶

When you connect via the P2P beacon (the default for most deployments), the control-plane path for each secret fetch traverses:

operator-broker (in cluster)  ->  beacon relay  ->  bridge  ->  OpenBao

That's two TLS hops beyond the direct-dial baseline. Latency depends entirely on geography of the beacon and bridge relative to your cluster:

Same-region beacon + bridge: ~10 ms additional round-trip.
Cross-continent: ~50-80 ms additional round-trip.

Critically, this cost is paid once per pod lifetime, at wrapper bootstrap. Secret fetches during normal operation happen in-memory from memfd -- no beacon round-trip.

What's coming -- tracker¶

We will publish formal benchmarks on a rolling basis against a documented test harness. The tracker for this work is:

cloudtaser-e2e-test# -- benchmarks harness (filed in this PR; link updated once numbered)

Specifically:

Benchmark	Tool	Scope
Wrapper microbenchmarks	Go testing + fortio	`getenv()` interposer latency, `memfd_secret` read, `fork+execve` cost
eBPF probe overhead	`bpftool prog profile`	Per-attached probe cycles per invocation; impact on common syscalls
S3 proxy throughput	vegeta + `wrk`	Plaintext vs encrypted PUT/GET across object-size classes (1 KiB, 64 KiB, 1 MiB, 16 MiB, 128 MiB)
End-to-end secret fetch	Fortio + Prometheus histograms	P50/P90/P99 from `cloudtaser-cli target install` through first pod secret-ready
Beacon relay steady-state	`perf` + Prometheus	Throughput of a saturated beacon (connections/sec, bytes/sec) with 3 replicas, HA topology

All runs will be published with:

Hardware profile (CPU family, VM SKU, kernel version)
Raw data (CSV, Prometheus dumps)
Reproducible harness (Terraform + scripts) so you can run the same numbers against your own substrate
CI-sourced data streaming into a rolling dashboard

When we will NOT publish numbers¶

We will not publish benchmark numbers on synthetic workloads that don't resemble any customer shape. The failure mode is "CloudTaser: 2.3% overhead" as a marketing quote against a hello-world microservice running on an idle m7i.metal, which tells a prospect nothing useful.

What we will publish is workload-typed benchmarks:

Web API (p50/p99 request latency, RPS at a given tail percentile)
Batch job (end-to-end runtime for a fixed input)
DB-backed service (SELECT-heavy and INSERT-heavy profiles separately)
High-fan-out small-object workload (thumbnails, feature flags, session reads against the S3 proxy)

How customers can measure today¶

You don't need our benchmarks to understand your own overhead. Below are runbook snippets that produce real numbers for your workload.

strace-based syscall timing¶

# Find the wrapper PID inside your target pod
PID=$(kubectl exec -n <ns> <pod> -- pidof cloudtaser-wrapper)

# Observe syscall breakdown for 30s
kubectl exec -n <ns> <pod> -- \
  strace -c -p "$PID" -f -e "trace=all" 2>&1 | \
  tee strace-summary.txt

Look for elevated openat / read / close activity that doesn't match your application pattern -- that's the wrapper footprint. For a dormant workload, it should be near-zero post-bootstrap.

eBPF agent overhead¶

# From the eBPF agent daemonset pod (or any node with bpftool)
kubectl exec -n cloudtaser-system <ebpf-agent-pod> -- \
  bpftool prog profile id <prog-id> duration 10 cycles instructions

Compare cycles/invocation against your baseline (syscall rate under normal load). bpftool prog show lists programs; pick the kprobe IDs for the functions your workload exercises.

Prometheus metrics from the operator¶

The operator exposes Prometheus metrics on /metrics (port :8080 by default). Scrape and graph:

cloudtaser_webhook_injections_total -- count of webhook-mutated pods (webhook hot path)
cloudtaser_webhook_latency_seconds -- webhook decision-making latency histogram
cloudtaser_broker_unseal_requests_total -- broker secret-fetch request count
cloudtaser_bridge_request_duration_seconds -- histogram of bridge round-trip latencies (the "how far is my secret source" number)
cloudtaser_bridge_connected -- gauge (0/1); should sit at 1
cloudtaser_bridge_proxy_requests_total -- count of admin-proxy requests through the bridge
cloudtaser_attestation_reports_total -- count of attestation reports (CC substrate only)
cloudtaser_admission_signature_rejections_total -- count of pods rejected by the signed-image admission policy

These are the metrics most useful for a performance story: histogram the bridge duration, look for tail regressions after upgrades, alert on bridge_connected == 0.

End-to-end demo timing¶

Run the interactive demo or cloudtaser-cli target install against a staging cluster and time each step:

time cloudtaser-cli source register ...       # typically <5s
time cloudtaser-cli target install ...        # typically 30-60s for full operator+eBPF+webhook rollout
time kubectl rollout status deploy/myapp      # pod bootstrap incl. wrapper secret fetch

These are the numbers you'll quote in your own operational runbook.

Pulling it together¶

For a typical web-API workload on commodity compute with a same-region beacon:

Pod start is dominated by the wrapper's ~1-2 s secret fetch -- one-time per pod.
Request-path latency is essentially unchanged (eBPF is free, getenv() is rarely hot, secrets are in-memory).
Object-storage workloads see per-request latency added only on small objects; large-object throughput is unaffected.
Fetching a fresh secret (rotation, emergency rebinding) costs one beacon-bridge round-trip plus OpenBao lookup -- tens of milliseconds, not hundreds.

If your workload has a specific shape that doesn't match the above (e.g., millions of getenv() calls per second, or 100 KB objects at 50k RPS), reach out -- we will profile with you and either publish the numbers or fix the overhead.

Operational Readiness -- SLA posture, blast radius, backout procedures
Security Model -- the threat model the overhead is buying you
Sovereign Deployment Decision Guide -- substrate choices that affect beacon latency
S3 Proxy Protocol -- protocol details for the object-storage overhead
Shared Responsibility -- what CloudTaser does and doesn't cover at runtime