Skip to content

Memory Isolation Landscape

A common reader question, once they have understood what memfd_secret does for the secret pages, is: "What does it actually take to protect my entire process memory, not just secrets? What are my options and which does CloudTaser use?"

This page is the honest answer. It surveys the kernel primitives, hardware boundaries, and emerging substrates that together make up "memory protection" on modern Linux, tells you which ones CloudTaser uses today, which ones we deliberately do not integrate with, and which ones compose cleanly with the product without any code change on our side.

It is a companion to the Sovereign Deployment Decision Guide (which tells you which substrate to pick) and the Shared Responsibility Model (which tells you where CloudTaser's responsibility ends and yours begins).


The threat model spectrum

Different adversaries live on different rungs of the stack. Each rung has a different primitive that defends against it, and CloudTaser's posture at each rung is different.

Threat model What protects against it How CloudTaser uses it Notes
Guest root / ptrace / /proc/<pid>/mem / process_vm_readv on the secret pages eBPF runtime enforcement + memfd_secret(2) We use both. Our primary protection layer today. eBPF blocks the syscalls; memfd_secret removes the pages from the kernel's direct map so even a root that evades eBPF finds nothing to read.
Kernel modules reading kernel memory (LKM rootkit on the node) memfd_secret(2) (removes pages from direct map) We use it. Effective on Linux 5.14+ with CONFIG_SECRETMEM=y. memfd_secret makes the pages unreachable even from ring 0.
Host kernel debug paths (/dev/mem, /dev/kmem, kdump) memfd_secret(2) We use it. Same direct-map unmapping closes these paths. eBPF additionally blocks writes to /dev/mem and /proc/* for defense-in-depth.
Swap-to-disk leakage (anonymous pages paged out to swapfile) mlock(2) We use it. Wrapper pins secret pages in physical RAM. Strict mode refuses to start if RLIMIT_MEMLOCK is insufficient.
Core dumps capturing process memory to disk MADV_DONTDUMP + PR_SET_DUMPABLE(0) We use both. Excludes secret regions from any dump the kernel may produce; the wrapper process itself is marked non-dumpable.
In-process memory corruption (heap overflow inside the app reading into a buffer adjacent to a secret buffer) Guard pages + mprotect toggles + canary values (sodium_malloc-style) Not current. Tracked as cloudtaser-wrapper#93 hardening issue. Not protected today -- under consideration. Today's answer: eBPF catches the resulting exfiltration syscall. Tomorrow's answer: the secret buffer itself is harder to reach via adjacent corruption.
Working heap of the customer app (ordinary process memory: rows from a DB, decrypted bytes in transit through the app, JSON being assembled) Confidential-compute substrate (SEV-SNP / TDX / Nitro Enclaves / ARM CCA) Depends on the customer's substrate choice. Not in memfd_secret. Ordinary guest RAM. Per the Sovereign Deployment Decision Guide -- a CC target SKU is what closes this rung.
Hypervisor reads of guest RAM (compelled or compromised host, CSP insider, foreign government compulsion) AMD SEV-SNP / Intel TDX / ARM CCA (confidential-compute VMs) CloudTaser runs cleanly inside. No code change. Required for the hypervisor-sovereignty story. The sovereign deployment guide walks through provider SKUs.
Hypervisor reads of specific pages at a smaller granularity than whole VM (process-granularity or pod-granularity enclaves) Intel SGX enclaves + Gramine / Occlum / SCONE / Enarx, or CNCF Confidential Containers (CoCo) We do not integrate with SGX enclaves. CoCo is a substrate on top of which CloudTaser runs transparently. See design decisions for the rationale. CoCo tracked as future consideration -- not currently supported as a first-class target.

The "wrap whole process memory" question

The specification question people ask next is: "Is there a bubble around the entire process, not just the secret pages?"

A short honest answer:

  • On commodity Linux, no. There is no kernel or userspace primitive that encrypts all process memory from the hypervisor on commodity compute. Full stop. Anyone claiming otherwise is selling you something that does not exist.
  • CC VMs are the whole-VM bubble. Guest RAM encrypted end-to-end with per-VM keys the hypervisor does not hold. AWS Nitro Enclaves (as a historical product) and Nitro-based Confidential VMs (Graviton3 TDX), GCP Confidential VMs (SEV-SNP generally available, TDX in preview), Azure DCdv5 / ECdv5. This is the substrate we recommend via the Sovereign Deployment Decision Guide.
  • Process-granularity enclaves (Intel SGX + LibOS wrappers Gramine / Occlum / SCONE / Enarx). These exist and are production-grade for narrow workloads. But Intel's roadmap is moving away from SGX on client parts, and on server parts the story is shifting toward TDX at the VM level. SGX has small EPC budgets, measurable performance hits, and attestation complexity that rivals building CC from scratch. CloudTaser does not integrate with SGX enclaves. This is a deliberate product decision -- the whole-VM CC path is cleaner and broader.
  • Pod-granularity enclaves (CNCF Confidential Containers, CoCo). CoCo runs a confidential VM per pod, with the pod definition launched into the CC-protected boundary. It is CC-VM under the hood -- the same AMD SEV-SNP / Intel TDX substrate, just cut at a pod grain instead of a whole-node grain. CloudTaser runs cleanly inside CoCo when it is available on your platform. Azure has CoCo in preview on AKS; GKE and EKS have it on the roadmap. When CoCo GAs on managed K8s we will document the pattern explicitly. No code change on our side.
  • Per-page memory encryption (Intel TME-MK) without SGX / TDX. Total Memory Encryption Multi-Key is a feature of recent Intel Xeon silicon: each physical page can be encrypted with a different key. Useful against swap-snapshot and cold-boot leakage. But the keys are held by firmware and the hypervisor. TME-MK alone does not defend against a compelled provider. It is not a substitute for CC VMs.
  • Homomorphic encryption (FHE) / Secure Multi-Party Computation (MPC). Compute on ciphertext without ever decrypting. For general workloads you pay a 10^4 to 10^6x slowdown, which rules it out for operational systems. Production-usable only for narrow patterns today (private information retrieval, specific analytics queries, some ML inference). Not a general "wrap memory" answer, and not a path CloudTaser is on.

What CloudTaser provides + what the customer brings

This table is the short version; the full treatment is in Shared Responsibility.

Adversary CloudTaser's coverage What the customer brings
Guest-root adversaries on commodity compute (kubelet-root, node-root, sidecar container root) Complete answer. memfd_secret + eBPF + mlock + dump controls. Nothing specific.
Hypervisor-level adversaries (provider insiders, compelled cloud operator, host-root on the bare metal) CloudTaser composes with CC. No code change needed. Pick a CC substrate. Attestation and key-release policy are the audit artefacts.
In-process memory corruption (adjacent-buffer heap overflow) Today: eBPF catches the resulting exfiltration syscall. Tomorrow: wrapper#93 hardening adds guard pages + mprotect toggle + canary. Follow normal memory-safe coding discipline. Rust / managed languages shrink the surface further.
Customer-app bugs leaking their own data (logging a token, sending a DB row to a webhook, handing a JWT to a third-party SDK) Not in scope. This is application-layer discipline. See Shared Responsibility for the boundary.

Design decisions we have deliberately not made

Because we are asked about them often enough that silence is misleading, these are on-the-record "no" decisions, not "not yet" decisions.

SGX / Gramine / Occlum / SCONE / Enarx integration -- not pursued

CloudTaser does not integrate with SGX enclaves, and we do not ship a LibOS-wrapped enclave binary (no Gramine manifest, no Occlum image, no SCONE-compiled wrapper, no Enarx keep).

Rationale:

  • Intel's public roadmap is moving SGX off client parts and shifting the enclave story on server parts toward TDX at the VM level. Building on SGX today means building on a substrate the hardware vendor is stepping away from.
  • SGX EPC (enclave page cache) budgets are small on production SKUs. For a general-purpose workload that wants to hold a real working set, EPC pressure produces real performance cliffs.
  • The attestation and key-release complexity of an SGX-wrapped binary is comparable to the complexity of launching into a CC VM with remote attestation. We would rather pay that complexity once, at the VM boundary, and get a broader protection envelope, than pay it per-process at a narrower grain.
  • Customers who need sub-VM isolation inside a shared cluster should look at Confidential Containers (CoCo), which runs a confidential VM per pod -- same protection grain as SGX in the ways that matter, on the same substrate Intel is actually investing in.

Userspace TME-MK keyring management -- not pursued

Without SGX or TDX, TME-MK's keys are held by firmware and the hypervisor. The protection against a compelled provider is therefore cosmetic. We would rather point customers at CC VMs, where the keying story actually holds, than build a userspace keyring around a primitive whose threat-model coverage does not include our actual adversary.


What is on our hardening roadmap (active)

These are "yes" decisions -- work either in flight or in the current planning horizon.

  • cloudtaser-wrapper#93 -- sodium_malloc-pattern defense-in-depth inside the wrapper. Guard pages on either side of every secret buffer, mprotect toggle so pages are PROT_NONE except during the very short window a reader is using them, canary values to detect adjacent-buffer corruption. Hardens the in-process-corruption rung of the table above.
  • cloudtaser-operator#233 -- already shipped. Admission-policy bundle enforces Cosign-signed images at admission time; see Supply-Chain Evidence for how to verify end-to-end.
  • cloudtaser-beacon#43 -- cert sync via gossip. Operational-robustness improvement for the P2P connectivity layer; see Beacon Trust Model.

  • Sovereign Deployment Decision Guide -- which CC substrate to pick and why region labels alone do not establish sovereignty.
  • Shared Responsibility Model -- what memfd_secret does and does not cover, and what your application code is still on the hook for.
  • Supply-Chain Evidence -- how to verify what CloudTaser ships before it runs in your cluster.
  • Memory Protection -- the mechanism-level walkthrough of memfd_secret, mlock, and dump controls as they ship in the wrapper.
  • Security Model -- trust boundaries, threat model, and what CloudTaser does and does not protect against.