Platform Engineering

Trusted CA Injection in Enterprise Kubernetes

Custom CAs are everywhere in enterprise Kubernetes — TLS inspection proxies, internal PKI, self-signed services. Containers don't trust them out of the box. Here is why every common workaround breaks at scale, what runtime-level injection via containerd NRI actually does in production, and the concrete code paths and failure modes we hit getting cainjekt operational across our managed fleet.

22 pages14 minJan Lauber

Get the Full Guide

Open source under github.com/natrontech/cainjekt

Free Guide

Trusted CA Injection in Enterprise Kubernetes

What's inside

1The trust store problem at scale
2Four workarounds and exactly why they break
3Runtime-level injection: the case for NRI
4Inside cainjekt: the three-phase pipeline
5What the OCI hook writes per OS distribution
6Why Java is special
7Production hardening that the upstream MVP lacked
8Operating cainjekt: metrics, alerts, debugging
9Comparison with cert-manager-trust-manager and mutating webhooks
10Lessons from the field

Preview

~55% of the guide

Every enterprise Kubernetes cluster ends up with at least one custom CA somewhere in the picture: a TLS inspection proxy on the egress path, an internal PKI for service-to-service mTLS, or a self-signed certificate on a legacy admin endpoint that nobody got around to fixing. Often all three.

Every container that makes an HTTPS call needs to trust those CAs, and out of the box, no container does.

This is not a Kubernetes problem in the strict sense. It is a "containers ship with public CAs only" problem that becomes a Kubernetes problem the moment you run more than a handful of services. After running managed Kubernetes for Swiss enterprises since 2021, we have watched teams attempt every variation of every workaround. They all break in the same ways, just on different timelines.

This whitepaper walks through the problem in concrete terms — actual file paths, actual environment variables, actual code paths — and then explains how runtime-level CA injection via containerd's Node Resource Interface (NRI) actually solves it. We open-sourced our implementation as cainjekt, a fork and substantial production hardening of an existing community project, so the technical details below map directly to inspectable code.

The trust store problem at scale

A modern container image ships with a CA bundle from its base distribution. The exact location depends on the distro:

Distribution	Trust store file	Anchor directory for individual CAs
Debian / Ubuntu	`/etc/ssl/certs/ca-certificates.crt`	`/usr/local/share/ca-certificates/`
RHEL / Fedora / CentOS	`/etc/pki/tls/certs/ca-bundle.crt`	`/etc/pki/ca-trust/source/anchors/`
openSUSE / SLES	`/etc/ssl/ca-bundle.pem`	`/etc/pki/trust/anchors/`
Alpine	`/etc/ssl/certs/ca-certificates.crt`	`/usr/local/share/ca-certificates/`
Arch	`/etc/ssl/certs/ca-certificates.crt`	`/etc/ca-certificates/trust-source/anchors/`

What is in those bundles: every public CA your container will ever see in the public internet. What is not in those bundles: a single byte of your enterprise's internal trust roots.

The result is the error every platform engineer has stared at thousands of times:

x509: certificate signed by unknown authority

In a small environment you can paper over this. Add a CA to a couple of images, restart, move on. In a Kubernetes platform with a hundred services across multiple teams, this becomes a constant operational tax. Every new deployment is a potential trust failure. Every CA rotation is a fleet-wide cascade of restarts. Every team that joins the platform discovers the problem fresh and reaches for whichever workaround is closest at hand.

Worse, the bundle file is only half the story. Modern language runtimes have their own CA configuration, often independent of the OS file:

Runtime	Honors OS trust store?	Override variable
OpenSSL / curl / wget	Yes	`SSL_CERT_FILE`
Go (`crypto/tls`)	Reads OS store at startup, but respects `SSL_CERT_FILE`	`SSL_CERT_FILE`
Python (`requests`, `urllib3`)	Often uses `certifi`, ignores OS store	`REQUESTS_CA_BUNDLE`, `SSL_CERT_FILE`
Node.js	No, ships its own bundle	`NODE_EXTRA_CA_CERTS` (additive)
Java (JVM)	No, uses its own keystore (`cacerts`)	`JAVA_TOOL_OPTIONS` with `-Djavax.net.ssl.trustStore=...`
Ruby (OpenSSL)	Yes	`SSL_CERT_FILE`

Solving the trust problem fully means handling all of this: the OS file, the per-language defaults, and the runtime-specific overrides. Solving it partially means leaving silent failures scattered across services that will surface only when the wrong code path runs in production.

OS trust store

A file in the container filesystem. Path and format depend on the distribution.

Debian / Ubuntu

/etc/ssl/certs/ca-certificates.crt

RHEL / Fedora

/etc/pki/tls/certs/ca-bundle.crt

Alpine

/etc/ssl/certs/ca-certificates.crt

openSUSE

/etc/ssl/ca-bundle.pem

Language runtime

Has its own trust behaviour. Configured via environment variables, often independent of the OS trust store.

Go / Python / Ruby

SSL_CERT_FILE

Node.js

NODE_EXTRA_CA_CERTS

Java (JDK 18+)

JAVA_TOOL_OPTIONS

A complete solution has to cover both layers.

Four workarounds and exactly why they break

Image baking

Bake the CA into every container image

Breaks as soon as vendor images appear or the CA rotates.

Volume mount

ConfigMap mounted as a file in the container

Does not patch the OS trust store. Language runtimes stay blind.

Init container

Pre-pod container patches the trust store

Fails on a read-only root filesystem.

Mutating webhook

Pod spec gets mutated at admission time

New critical point on the pod creation path.

Workaround 1: bake the CA into every image

The most direct approach. The team that maintains the base image adds the company CA via update-ca-certificates. Every downstream image inherits it. Done.

It works until you do not control the base image. It breaks when the CA rotates and you now need to rebuild every image in the company. It breaks when a third-party Helm chart pulls a vendor image you cannot rebuild without forking. It breaks when somebody ships a new microservice straight from node:20-alpine and forgets the wrapper image.

Even when it does work, the concrete failure mode is silent: the OS bundle is updated, but a Python service uses certifi, a Node service uses its own bundle, a Java service uses cacerts. The OS store update papers over some calls and not others. Engineers spend hours tracking down "why does curl work but the Java connection fails" before realising the OS update was never the right fix.

Image baking couples your trust infrastructure to your build infrastructure. They have completely different lifecycles, and the mismatch produces failures at unpredictable boundaries.

Workaround 2: mount the CA via a volume

A ConfigMap or Secret contains the CA bundle. Every Pod spec mounts it. The application is configured to point at the mounted path.

Three problems compound. First, the OS trust store is not modified — only a file appears on disk. Tools that hard-code the system path (curl, wget, most language standard libraries reading /etc/ssl/certs/) do not pick it up unless the image was built knowing about the mount. Second, every Pod spec everywhere needs the mount. This is where mutating webhooks get reached for, which becomes workaround 4. Third, the per-language env vars are still missing — a bare volume mount does not configure NODE_EXTRA_CA_CERTS or JAVA_TOOL_OPTIONS.

Workaround 3: init containers that patch the trust store

A clever variant. An init container starts before the main container, has a writable filesystem, and runs update-ca-certificates (or appends directly to the trust store) before the application runs.

It works if the main container has a writable rootfs. It does not work if your platform's security policy enforces readOnlyRootFilesystem: true — and if it does not enforce that, you have a different platform problem. It also doubles your startup time and adds another point of failure. And it still does not handle language runtimes that ignore the OS store.

Workaround 4: mutating admission webhooks

Once teams realise that the volume-mount workaround needs to be applied to every Pod spec consistently, a mutating webhook is the next escalation. The webhook injects the volume mount, the environment variables, and sometimes an init container into every Pod spec at admission time.

This works in the sense that the work happens automatically. But the webhook is now a new piece of platform infrastructure with its own availability requirements. It runs synchronously in the critical path of every Pod creation API call. If the webhook fails, Pods do not get CAs. If the webhook has a bug, every Pod is affected. If the webhook needs an upgrade, you are upgrading a piece of infrastructure that mutates every Pod across the cluster. It is also operating at the Kubernetes API level, not the container runtime level — so it inherits all the limitations of the volume-mount and env-var approach, just automated.

The pattern across all four: each one is a reasonable response to the immediate problem. None of them solves the problem at the right level.

Runtime-level injection: the case for NRI

The right level is the container runtime. By the time the API server has processed the Pod and the kubelet has handed it to containerd, the trust problem has nothing to do with Kubernetes anymore. It is purely about the container's filesystem and environment, both of which the runtime controls.

Containerd added the Node Resource Interface (NRI) for exactly this kind of integration: a stable, supported plugin API that lets components observe and adjust container lifecycle events without forking the runtime, without running webhooks on the API critical path, and without requiring elevated cluster-wide privileges.

An NRI plugin runs as a DaemonSet on every node. When containerd creates a container, the plugin receives a CreateContainer event with the OCI spec, the pod metadata, and the ability to inject hooks, mounts, and environment variables before the container starts. The Pod spec is never touched. There is no admission webhook on the request path. The opt-in is a single annotation per pod (or per namespace).

Crucially, NRI gives the plugin access to the container lifecycle, not just the Pod. That distinction matters: a Pod has multiple containers (init, main, sidecars), and we want to be precise about which ones get injection and which do not. A service-mesh sidecar like istio-proxy has its own trust configuration and should not be touched.

Get the Full Guide

Enter your email and get instant access to the full guide as a downloadable PDF.

Why every enterprise platform with custom CAs has this exact problem
How four common workarounds (image baking, volume mounts, init containers, mutating webhooks) break at scale, with concrete failure modes
How containerd NRI enables transparent runtime-level injection
What the OCI hook actually writes where, per OS distribution
Why Java is special and why we never touch the JVM's cacerts file
Production lessons from running cainjekt across our managed Kubernetes fleet

Free download. No spam. We never share your data with third parties.

Open source under github.com/natrontech/cainjekt

Trusted CA Injection in Enterprise Kubernetes

22 pages

You might also be interested in

Platform Engineering

Ingress NGINX Retirement: Why We Didn't Switch to Gateway API Yet

In November 2025, ingress-nginx was officially retired, with best-effort maintenance set to end March 2026. The obvious replacement was Gateway API. We ruled it out. Here is why we chose a different path, what we evaluated instead, and how we migrated our entire customer base.

Download Free Guide

Infrastructure

Why Organizations Are Not Moving to Proxmox

Proxmox is not a VMware replacement you install over a weekend. After three years running 1000+ VMs in production, here is what actually holds organizations back, and what it takes to succeed.

Download Free Guide