Trusted CA Injection in Enterprise Kubernetes
Custom CAs are everywhere in enterprise Kubernetes — TLS inspection proxies, internal PKI, self-signed services. Containers don't trust them out of the box. Here is why every common workaround breaks at scale, what runtime-level injection via containerd NRI actually does in production, and the concrete code paths and failure modes we hit getting cainjekt operational across our managed fleet.
Open source under github.com/natrontech/cainjekt
Trusted CA Injection in Enterprise Kubernetes
What's inside
- 1The trust store problem at scale
- 2Four workarounds and exactly why they break
- 3Runtime-level injection: the case for NRI
- 4Inside cainjekt: the three-phase pipeline
- 5What the OCI hook writes per OS distribution
- 6Why Java is special
- 7Production hardening that the upstream MVP lacked
- 8Operating cainjekt: metrics, alerts, debugging
- 9Comparison with cert-manager-trust-manager and mutating webhooks
- 10Lessons from the field
Every enterprise Kubernetes cluster ends up with at least one custom CA somewhere in the picture: a TLS inspection proxy on the egress path, an internal PKI for service-to-service mTLS, or a self-signed certificate on a legacy admin endpoint that nobody got around to fixing. Often all three.
Every container that makes an HTTPS call needs to trust those CAs, and out of the box, no container does.
This is not a Kubernetes problem in the strict sense. It is a "containers ship with public CAs only" problem that becomes a Kubernetes problem the moment you run more than a handful of services. After running managed Kubernetes for Swiss enterprises since 2021, we have watched teams attempt every variation of every workaround. They all break in the same ways, just on different timelines.
This whitepaper walks through the problem in concrete terms — actual file paths, actual environment variables, actual code paths — and then explains how runtime-level CA injection via containerd's Node Resource Interface (NRI) actually solves it. We open-sourced our implementation as cainjekt, a fork and substantial production hardening of an existing community project, so the technical details below map directly to inspectable code.
The trust store problem at scale
A modern container image ships with a CA bundle from its base distribution. The exact location depends on the distro:
| Distribution | Trust store file | Anchor directory for individual CAs |
|---|---|---|
| Debian / Ubuntu | /etc/ssl/certs/ca-certificates.crt | /usr/local/share/ca-certificates/ |
| RHEL / Fedora / CentOS | /etc/pki/tls/certs/ca-bundle.crt | /etc/pki/ca-trust/source/anchors/ |
| openSUSE / SLES | /etc/ssl/ca-bundle.pem | /etc/pki/trust/anchors/ |
| Alpine | /etc/ssl/certs/ca-certificates.crt | /usr/local/share/ca-certificates/ |
| Arch | /etc/ssl/certs/ca-certificates.crt | /etc/ca-certificates/trust-source/anchors/ |
What is in those bundles: every public CA your container will ever see in the public internet. What is not in those bundles: a single byte of your enterprise's internal trust roots.
The result is the error every platform engineer has stared at thousands of times:
x509: certificate signed by unknown authorityIn a small environment you can paper over this. Add a CA to a couple of images, restart, move on. In a Kubernetes platform with a hundred services across multiple teams, this becomes a constant operational tax. Every new deployment is a potential trust failure. Every CA rotation is a fleet-wide cascade of restarts. Every team that joins the platform discovers the problem fresh and reaches for whichever workaround is closest at hand.
Worse, the bundle file is only half the story. Modern language runtimes have their own CA configuration, often independent of the OS file:
| Runtime | Honors OS trust store? | Override variable |
|---|---|---|
| OpenSSL / curl / wget | Yes | SSL_CERT_FILE |
Go (crypto/tls) | Reads OS store at startup, but respects SSL_CERT_FILE | SSL_CERT_FILE |
Python (requests, urllib3) | Often uses certifi, ignores OS store | REQUESTS_CA_BUNDLE, SSL_CERT_FILE |
| Node.js | No, ships its own bundle | NODE_EXTRA_CA_CERTS (additive) |
| Java (JVM) | No, uses its own keystore (cacerts) | JAVA_TOOL_OPTIONS with -Djavax.net.ssl.trustStore=... |
| Ruby (OpenSSL) | Yes | SSL_CERT_FILE |
Solving the trust problem fully means handling all of this: the OS file, the per-language defaults, and the runtime-specific overrides. Solving it partially means leaving silent failures scattered across services that will surface only when the wrong code path runs in production.
A file in the container filesystem. Path and format depend on the distribution.
Has its own trust behaviour. Configured via environment variables, often independent of the OS trust store.
A complete solution has to cover both layers.
Four workarounds and exactly why they break
Bake the CA into every container image
ConfigMap mounted as a file in the container
Pre-pod container patches the trust store
Pod spec gets mutated at admission time
Workaround 1: bake the CA into every image
The most direct approach. The team that maintains the base image adds the company CA via update-ca-certificates. Every downstream image inherits it. Done.
It works until you do not control the base image. It breaks when the CA rotates and you now need to rebuild every image in the company. It breaks when a third-party Helm chart pulls a vendor image you cannot rebuild without forking. It breaks when somebody ships a new microservice straight from node:20-alpine and forgets the wrapper image.
Even when it does work, the concrete failure mode is silent: the OS bundle is updated, but a Python service uses certifi, a Node service uses its own bundle, a Java service uses cacerts. The OS store update papers over some calls and not others. Engineers spend hours tracking down "why does curl work but the Java connection fails" before realising the OS update was never the right fix.
Image baking couples your trust infrastructure to your build infrastructure. They have completely different lifecycles, and the mismatch produces failures at unpredictable boundaries.
Workaround 2: mount the CA via a volume
A ConfigMap or Secret contains the CA bundle. Every Pod spec mounts it. The application is configured to point at the mounted path.
Three problems compound. First, the OS trust store is not modified — only a file appears on disk. Tools that hard-code the system path (curl, wget, most language standard libraries reading /etc/ssl/certs/) do not pick it up unless the image was built knowing about the mount. Second, every Pod spec everywhere needs the mount. This is where mutating webhooks get reached for, which becomes workaround 4. Third, the per-language env vars are still missing — a bare volume mount does not configure NODE_EXTRA_CA_CERTS or JAVA_TOOL_OPTIONS.
Workaround 3: init containers that patch the trust store
A clever variant. An init container starts before the main container, has a writable filesystem, and runs update-ca-certificates (or appends directly to the trust store) before the application runs.
It works if the main container has a writable rootfs. It does not work if your platform's security policy enforces readOnlyRootFilesystem: true — and if it does not enforce that, you have a different platform problem. It also doubles your startup time and adds another point of failure. And it still does not handle language runtimes that ignore the OS store.
Workaround 4: mutating admission webhooks
Once teams realise that the volume-mount workaround needs to be applied to every Pod spec consistently, a mutating webhook is the next escalation. The webhook injects the volume mount, the environment variables, and sometimes an init container into every Pod spec at admission time.
This works in the sense that the work happens automatically. But the webhook is now a new piece of platform infrastructure with its own availability requirements. It runs synchronously in the critical path of every Pod creation API call. If the webhook fails, Pods do not get CAs. If the webhook has a bug, every Pod is affected. If the webhook needs an upgrade, you are upgrading a piece of infrastructure that mutates every Pod across the cluster. It is also operating at the Kubernetes API level, not the container runtime level — so it inherits all the limitations of the volume-mount and env-var approach, just automated.
The pattern across all four: each one is a reasonable response to the immediate problem. None of them solves the problem at the right level.
Runtime-level injection: the case for NRI
The right level is the container runtime. By the time the API server has processed the Pod and the kubelet has handed it to containerd, the trust problem has nothing to do with Kubernetes anymore. It is purely about the container's filesystem and environment, both of which the runtime controls.
Containerd added the Node Resource Interface (NRI) for exactly this kind of integration: a stable, supported plugin API that lets components observe and adjust container lifecycle events without forking the runtime, without running webhooks on the API critical path, and without requiring elevated cluster-wide privileges.
An NRI plugin runs as a DaemonSet on every node. When containerd creates a container, the plugin receives a CreateContainer event with the OCI spec, the pod metadata, and the ability to inject hooks, mounts, and environment variables before the container starts. The Pod spec is never touched. There is no admission webhook on the request path. The opt-in is a single annotation per pod (or per namespace).
Crucially, NRI gives the plugin access to the container lifecycle, not just the Pod. That distinction matters: a Pod has multiple containers (init, main, sidecars), and we want to be precise about which ones get injection and which do not. A service-mesh sidecar like istio-proxy has its own trust configuration and should not be touched.
Get the Full Guide
Enter your email and get instant access to the full guide as a downloadable PDF.
- Why every enterprise platform with custom CAs has this exact problem
- How four common workarounds (image baking, volume mounts, init containers, mutating webhooks) break at scale, with concrete failure modes
- How containerd NRI enables transparent runtime-level injection
- What the OCI hook actually writes where, per OS distribution
- Why Java is special and why we never touch the JVM's cacerts file
- Production lessons from running cainjekt across our managed Kubernetes fleet
Free download. No spam. We never share your data with third parties.
Open source under github.com/natrontech/cainjekt
Trusted CA Injection in Enterprise Kubernetes
22 pages