April 2, 2026|15 min read

Managed Kubernetes Is Not Managed Kubernetes

How we built a unified managed Kubernetes platform across Natron Cloud, Flex Stack, and customer hyperscalers - and why the cluster itself is only 20% of the work.

By Adrian Berger

1.The cluster is 20% of the work
2.One platform, three infrastructure targets
3.Why managed Kubernetes on a hyperscaler is not enough
4.The batteries we include
4.1.Basic (included in every cluster)
4.2.Premium (for teams that want GitOps and policy enforcement)
4.3.Enterprise (for complex, multi-team environments)
5.Same automation, different base
6.What this looks like in practice
7.The part everyone underestimates: long-term maintenance
8.The real differentiator: operations, not installation
9.Get started

Every major cloud provider offers "managed Kubernetes". Azure has AKS. Google has GKE. And when you spin one up, you get a control plane, some nodes, and a kubeconfig. Congratulations, you have Kubernetes.

What you do not have is a platform.

We have been running managed Kubernetes for Swiss enterprises since 2021, across our own infrastructure and on customer hyperscalers. The single biggest lesson: the gap between a running cluster and a production-ready platform is enormous. This post explains what we actually deliver when we say "Managed Kubernetes", how our platform works across different infrastructure targets, and which batteries we include that you would otherwise spend months building yourself.

The cluster is 20% of the work

When most people hear "Managed Kubernetes", they think about the control plane. Someone else runs etcd, the API server, the scheduler. Nodes get provisioned. Updates get applied. That part is solved. AKS does it, GKE does it, we do it on Natron Cloud.

But a cluster without platform services is like a server without an operating system. You can run things on it, but you will spend most of your time solving problems that have nothing to do with your actual application:

Who manages TLS certificates? Who rotates them before they expire at 2 AM on a Saturday?
Where do logs go? Can you actually find the error that caused last night's outage?
What happens when someone deploys a container running as root with no resource limits?
How do you restore a stateful workload after an accidental kubectl delete?
Who gets paged when a node runs out of disk?

These are not edge cases. These are Tuesday.

Capability

AKS / GKE

Natron

Control plane

Node provisioning

K8s version upgrades

CNI with network policies

Automated TLS certificates

Centralized logging

Metrics & alerting

Backup & restore

Policy enforcement

GitOps delivery

Secret management

24/7 operations team

One platform, three infrastructure targets

We built our platform to run identically across three deployment models. Same tooling, same automation, same SLA guarantees. The only difference is where the base cluster runs.

Natron Platform Stackidentical everywhere

Cilium CNIcert-managerIngressPrometheusGrafanaLokiAlertmanagerVeleroBlackbox Exporter

Natron CloudOur infrastructure

Base: Proxmox VE + Ceph

Flex StackYour hardware

Base: Dedicated hardware

BYOCYour hyperscaler

Base: AKS / GKE

Natron Cloud is our own infrastructure. Swiss datacenters, Proxmox VE virtualization, Ceph storage, full data sovereignty. We provision the cluster from bare metal up. This is where we have the most control and where most of our Swiss customers start.

Natron Flex Stack takes the same stack and deploys it on dedicated hardware, either in our datacenter or yours. Same Proxmox, same Ceph, same Kubernetes platform on top. The difference: your hardware, your isolation, predictable costs. We have customers in regulated industries who need this level of physical separation.

Bring Your Own Cloud is for customers who already have Azure or GCP contracts. Here, we use AKS or GKE as the base cluster. The customer keeps their existing cloud relationship and billing. We deploy our full platform stack on top.

And this is where it gets interesting.

Why managed Kubernetes on a hyperscaler is not enough

When we onboard a customer running AKS or GKE, the first thing we hear is: "We already have managed Kubernetes, we just need help with operations." Then we look at their cluster and find:

No centralized logging. Logs go to the cloud provider's log sink, which costs a fortune at scale and is painful to query.
No backup strategy. The cloud provider backs up etcd (the cluster state), but not your PersistentVolumes, not your Helm releases, not your CRDs.
Certificates managed manually or with a cronjob someone wrote two years ago.
No network policies. Every pod can talk to every other pod. One compromised container means lateral movement across the entire cluster.
Monitoring is "we have the cloud provider's dashboard" which shows CPU and memory but nothing application-specific.

The cloud provider's "managed" means: we run the control plane, the rest is your problem. Our "managed" means: we run everything you need to sleep at night.

The batteries we include

Every Kubernetes cluster we manage, regardless of the infrastructure target, ships with the same platform stack. We organize it in three tiers: Basic, Premium, and Enterprise.

BasicEvery cluster, day one

Cilium CNI

NGINX Ingress

cert-manager

Prometheus

Grafana

Loki

Alertmanager

Velero Backups

Blackbox Exporter

PremiumGitOps & policy enforcement

ArgoCD

External Secrets

Kyverno

EnterpriseCustom integrations

Custom Observability

Custom Networking

Custom Storage

Custom Operators

Basic (included in every cluster)

These are non-negotiable. Every cluster gets them from day one:

Networking: Cilium. eBPF-based CNI with network policies, transparent encryption, and observability. Not the default kubenet or Azure CNI. Cilium gives us L7 visibility and consistent network policy behavior across all our infrastructure targets.

Ingress: NGINX Ingress Controller + cert-manager. Automated TLS with Let's Encrypt or internal CAs. Certificate rotation is automatic. No more expired certificates taking down your customer-facing services.

Observability: Prometheus, Grafana, Loki, Alertmanager. Full metrics, logging, and alerting stack. Not the cloud provider's pay-per-query offering. A dedicated stack running inside the cluster with sane defaults, pre-built dashboards, and alert rules that actually mean something.

Backup: Velero. Scheduled backups of cluster state, persistent volumes, and configurations. Tested restore procedures. When something goes wrong, and it will, you can recover without guessing.

External monitoring: Blackbox Exporter. We probe your endpoints from outside the cluster. If your ingress goes down, we know before your customers do.

Premium (for teams that want GitOps and policy enforcement)

ArgoCD. GitOps-based continuous delivery. Every change goes through Git, gets reviewed, and is reconciled automatically. Drift detection catches manual changes and reverts them. This is not optional tooling. It is how we manage the platform itself.

External Secrets Operator. Connect to Vault, Azure Key Vault, GCP Secret Manager, or AWS Secrets Manager. Secrets stay where they belong and get synced into the cluster automatically.

Kyverno. Policy engine for admission control, mutation, and resource generation. We wrote an entire blog post about why we chose Kyverno over OPA Gatekeeper. It enforces security baselines, resource constraints, and organizational policies without requiring your developers to know they exist.

Enterprise (for complex, multi-team environments)

Custom integrations for non-standard requirements. Existing observability systems (Datadog, Splunk, Dynatrace) that need to be fed. Specific storage backends. Complex network topologies. Multi-tenant setups where different teams need different policies, quotas, and access controls.

This is where our Platform Design engagement comes in. We design the tenancy model, the guardrails, and the onboarding workflow for your specific organization.

Same automation, different base

The key architectural decision: our platform layer is decoupled from the infrastructure layer. We use the same Helm charts, the same ArgoCD apps, the same monitoring configurations whether the underlying cluster runs on Natron Cloud, Flex Stack, AKS, or GKE.

When we onboard a BYOC customer on Azure, we:

ConnectAccess to AKS / GKE cluster

BootstrapDeploy platform stack (Cilium, cert-manager, observability)

GitOpsArgoCD manages platform from Git

BackupVelero to customer storage account

AlertingRoute to Natron ops + customer on-call

From that point, the cluster is managed the same way as every other cluster in our fleet. Same runbooks, same escalation paths, same SLA.

This also means migration between infrastructure targets is realistic. We have moved customers from self-managed clusters to Natron Cloud, from Natron Cloud to Flex Stack (when they needed dedicated hardware), and from hyperscaler-managed to our own infrastructure (when data sovereignty became a requirement). The workloads move. The platform layer stays the same.

What this looks like in practice

A real example: A Swiss financial services company came to us running three self-managed Kubernetes clusters on Azure. Each cluster had been set up by a different team at different times. One used Flannel for networking, one used Azure CNI, one used Calico. Logging was inconsistent. Backups existed for one cluster. Monitoring was "we look at Azure Monitor sometimes".

We consolidated to two AKS clusters with our full platform stack. Standardized networking on Cilium. Deployed consistent observability. Set up Velero backups to Azure Blob Storage. Implemented Kyverno policies for their compliance requirements. Connected External Secrets Operator to their existing Azure Key Vault.

Six months later, their team spends zero time on cluster operations. They deploy through ArgoCD, manage namespaces and access self-service through our platform tooling, and check dashboards in Grafana. When they hit a complex issue, a networking problem they cannot explain, a performance degradation under load, a deployment that fails in production but works in staging, they reach out to us for 3rd-level support. Our engineers bring years of experience running and troubleshooting container platforms across dozens of clusters. That is what managed Kubernetes means.

The part everyone underestimates: long-term maintenance

This is the conversation we have most often. A team evaluates our platform and says: "We can install Cilium and cert-manager ourselves. We do not need a managed service for that." They are right. Installation is the easy part. The question is what happens in month 6, month 12, month 36.

The initial setup is a weekend project. The maintenance is a full-time job nobody signed up for.

Here is what actually happens when you self-manage platform components:

Deprecations catch you off guard. Kubernetes deprecates APIs every release. Your Ingress manifests break because networking.k8s.io/v1beta1 is gone. cert-manager v1.12 changes the way ClusterIssuers work. The Prometheus Operator renames its CRDs. Each of these requires reading changelogs, updating manifests, testing, and rolling out. Across how many clusters? With what test coverage?

Security patches pile up. A CVE hits Cilium. Another one hits ingress-nginx. A third one affects the Go runtime that half your operators are built on. Each patch needs evaluation (does it affect us?), testing (does it break anything?), and rollout (coordinated, not all-at-once). When you are responsible for 15 platform components, each with its own release cycle, the patch backlog grows faster than anyone expected.

Nobody owns it. The engineer who set up Prometheus left six months ago. The person who configured Velero backups moved to a different team. The ingress controller was installed by a consultant during the initial setup. Now there is an incident, the dashboard is empty, and nobody knows which Helm values were used or why that specific configuration was chosen. There is no runbook because nobody wrote one.

Incidents expose the gaps. When production is on fire at 2 AM, your team triages the application. But the root cause is a platform component. The PersistentVolume is full because nobody configured Loki retention. The ingress is returning 502s because the cert-manager renewal failed silently three days ago. The network policy is blocking traffic because the last Cilium upgrade changed the default behavior. These are not application bugs. They are platform operations failures.

Migration becomes impossible. You installed Pod Security Policies in 2022. Kubernetes deprecated them in favor of Pod Security Standards. You installed Traefik as your ingress controller, but now your requirements need features only NGINX or Envoy provide. You are running Calico for CNI but need Cilium for eBPF-based network policies. Each of these migrations is a project that requires planning, testing, and execution. But nobody in the team has capacity because they are busy shipping features. So the technical debt grows, and the platform slowly becomes a liability instead of an enabler.

The real cost is not the tools. It is the operational processes around them. Patch management. Upgrade planning. Deprecation tracking. Incident runbooks. Backup testing. Monitoring of the monitoring. These are the things that separate "we installed it" from "we operate it". And they are the things that quietly fall apart when the team that set them up moves on to other priorities.

A real example happening right now: the NGINX Ingress Controller. The community-maintained ingress-nginx project, which has been the default ingress controller for most Kubernetes clusters for years, is being deprecated. The project has struggled with maintainer bandwidth, slow CVE response times, and an increasingly outdated architecture. F5 has stepped in with the F5 NGINX Ingress Controller, a commercially backed, actively maintained replacement with better performance, native support for NGINX Plus features, and a clear long-term roadmap.

If you self-manage your ingress controller, this deprecation lands on your desk. You need to evaluate the replacement, understand the configuration differences (they are not a 1:1 mapping), plan the migration path, test every Ingress resource and annotation your applications use, coordinate the cutover with your development teams, and execute the migration without downtime. For a team that is already busy shipping features, this is weeks of unplanned work.

For our customers, this is our problem, not theirs. We are already planning and executing the migration to the F5 NGINX Ingress Controller across our entire fleet. We evaluate which customer configurations need adjustment, we test the new controller against each cluster's specific Ingress resources and annotations, and we coordinate the migration timeline with each customer. Some clusters have straightforward setups and migrate quickly. Others have custom annotations, rate limiting rules, or complex routing that needs careful attention. We handle both. The customer gets a notification, a migration window, and a validated result. Not a surprise when their ingress stops working because a community project went unmaintained.

This is exactly the kind of migration that never happens when nobody owns the platform. The deprecation announcement goes into a backlog. Someone creates a ticket. The ticket sits for six months because there is always something more urgent. Then a CVE hits the old controller, there is no patch, and now it is an emergency migration instead of a planned one.

We track deprecations, CVEs, and breaking changes across our entire fleet. When cert-manager v1.15 changes the ClusterIssuer spec, we update it across every cluster we manage, test it against each customer's configuration, and roll it out in a coordinated wave. When a Cilium CVE drops, we evaluate, patch, and deploy within our SLA window. Not because any individual cluster is special, but because this is all we do.

The real differentiator: operations, not installation

Installing toolsDay 1

helm install prometheus5 min

helm install grafana5 min

helm install cert-manager3 min

helm install velero5 min

helm install cilium10 min

Anyone can do this in an afternoon.

Operating a platformDay 2 - Day 1460

Refine alert rules across 50+ clusters

Upgrade Kubernetes every month

Patch CVEs across all platform components

Handle 3 AM incidents with context

Test backup restores continuously

This is what we do.

Installing these tools is not hard. Most of them have Helm charts. You can set up Prometheus and Grafana in an afternoon. The hard part is everything that comes after:

What do you alert on? We have refined our alert rules over four years across dozens of clusters. We know which alerts are noise and which ones mean someone needs to look immediately.
What happens at 3 AM? We have an operations team. Not a ticketing system. Not a chatbot. Engineers who know your cluster, your workloads, and your architecture.
How do you upgrade? Kubernetes releases every four months. Each upgrade needs testing against your workloads, your operators, your policies. We do this continuously.
What about the platform components? Cilium, cert-manager, ArgoCD, Kyverno, and every other component has its own release cycle, its own breaking changes, its own CVEs. We track and apply these across our entire fleet.
Who plans the migration when a component reaches end of life? We do. We have migrated customers from Calico to Cilium, from Pod Security Policies to Kyverno, from self-managed Prometheus to our standardized observability stack, and we are currently migrating our entire fleet from the deprecated community ingress-nginx to the F5 NGINX Ingress Controller. Each migration is planned, tested, and executed without downtime.

We manage the boring parts so your team can focus on shipping features.

Get started

If you are evaluating managed Kubernetes providers in Switzerland, or struggling to operationalize the cluster you already have, we should talk. We offer a free initial consultation where we look at your current setup and discuss what a managed platform would look like for your specific requirements.

Browse our Kubernetes platform tiers to see what is included at each level. Check our customer references to see who runs on our platform today. Or schedule a call directly.