Matthias Winzeler

Founder

Building the Cloud Native Data Center – Part 2: Why On-Prem Kubernetes is Hard

12 min read
Cover Image

Running Kubernetes on-prem is far harder than running it in the cloud. Let's find out why.

In the first part of this blog post series, we introduced the idea of the Cloud Native Data Center: using Kubernetes as the common base for all kind of modern workloads, whether in the cloud or in your own data center.

Let’s quickly revisit the big picture:

The Cloud Native Data Center

In the next articles of this series, we’ll start working on the first piece of the puzzle, the foundational layer:

How can we run Kubernetes on-premises?

The Cloud Native Data Center - Kubernetes

Be warned: running Kubernetes in your own data center is not easy:

Kubernetes itself is a powerful platform that demands expertise across multiple layers (networking, security, storage). When it becomes truly challenging though is once you try to integrate it into an existing data center, with all its traditional systems, established processes and people.

Let’s delve into why.

What is Kubernetes, Anyway?

When people talk about “Kubernetes”, they often mean the whole ecosystem around it, not just the core software. This ecosystem spans many different areas, from networking and storage to developer platforms and observability.

To make sense of it, let’s break it down into layers:

Kubernetes-Overview

In this article, we’ll focus solely on the lowest, blue layer: building a ready-but-empty Kubernetes cluster made up of Control Plane and Worker Nodes.

Once this foundation is in place, you can start adding the higher layers of the platform on top.

So, What’s Inside?

Let’s look at the main components that make up the Control Plane and Worker Nodes of a Kubernetes cluster:

Kubernetes-Overview

Control Plane

The control plane manages the cluster’s state and resources (Workloads, Nodes, …) and consists of the following components:

kube-apiserver, kube-scheduler, kube-controller-manager

These are the core Golang processes of Kubernetes:

  • kube-apiserver: exposes the HTTP API to all clients
  • kube-scheduler: assigns workloads to workers
  • kube-controller-manager: runs the background controllers that power features like Deployments

In isolation, these are relatively simple to operate: distribute the binaries, configure them, and start the processes. If they crash, restart them; to upgrade, roll out new binaries.

Load Balancer

For production use, you’ll want to run multiple instances of the control plane for high availability. This requires a load balancer in front of the kube-apiserver.

In cloud environments, this is easy, as the providers usually offer virtual load balancers out-of-the-box. On-premises, this a bit harder to achieve. You have different options:

  • Use a software load balancer: If you use VMs for your control plane servers, the virtualization platform might offer software load balancers (for example VMWare’s NSX or OpenStack’s LBaaS).
  • Use a hardware load balancer: Your company might own expensive hardware load balancing appliances that you could use. Be aware that they are usually the responsibility of a different team which might not offer self-service.
  • Build your own load balancer: There is plenty of tech available to build a simple load balancer yourself, for example HAProxy with keepalived. There are usually some prerequisites to be met on the network side (for example keepalived is based on VRRP which requires a shared L2 broadcast domain), but certainly sufficient if you’re willing to put in some engineering hours.

The Datastore (etcd)

The kube-apiserver stores all cluster state in a key-value database called etcd.

Running etcd reliably is one of the hardest parts of operating Kubernetes. You essentially have to become its DBA:

  • You must back it up regularly, keep copies off-site, and test restores
  • etcd requires a quorum (an odd number of nodes), which is tricky if you only have two physical data center sites (like many companies traditionally do) and want to stay available during a site outage
  • Loss of quorum (due to network partitions, failed upgrades, …) can require painful manual recovery steps

We’ve found that etcd is often the most operationally sensitive part of the control plane.

Resource Requirements

Control plane nodes are relatively lightweight: usually around 4 vCPUs and 16 GB RAM each is sufficient. etcd’s data usually stays below 10 GB.

Because of this, most companies start with three “stacked” control plane nodes (etcd + kube-* processes on the same machines):

Kubernetes-Overview

As your cluster grows, you can scale the nodes up and eventually move etcd onto its own dedicated servers.

Worker Nodes

Once the control plane is in place, you can add worker nodes to run your workloads.

Compared to the control plane, workers are much simpler: they mainly run the kubelet (which connects to the control plane and launches your containers). They hold no critical cluster state, so they can be added or replaced easily.

For sizing, pick whatever fits your workload profile, but since Kubernetes was built for horizontal scaling, it works best with many smaller nodes rather than a few large ones.

Day 2 Operations

Once the cluster is up and running, the work is not finished. There are a bunch of day 2 operations required to keep a cluster up-to-date and healthy.

Some of the ongoing tasks your teams will need to handle:

  • Frequent updates

    • Kubernetes releases monthly patch versions and new minor versions about every 4 months
    • etcd gets regular updates as well
    • The underlying operating system (kernel, glibc, systemd) and container runtime need security patches
  • Coordinated rollouts

    • Upgrades must be applied in a non-disruptive, rolling fashion
    • Ideally automated, as the release cadence is high
  • Monitoring and scaling

    • Like any distributed system, all components must be continuously monitored
    • You must react quickly to performance bottlenecks or capacity limits
  • Certificate and secret rotation

    • Kubernetes has various internal certificates and credentials that expire and must be rotated on schedule

All of these steps must be done without disrupting running workloads and often across dozens of clusters.

To get a sense of the scope, here’s an overview of the main components and their lifecycle actions:

Kubernetes Lifecycle Action

How Many of Those Clusters Do We Need?

Now we’ve seen what it takes to run a single cluster. You might already have guessed the bad news: one Kubernetes cluster won’t be enough for your entire company.

Kubernetes’ built-in multi-tenancy is fairly weak ( see the official documentation), so in practice you’ll need multiple clusters.

Some common reasons:

  • You want to separate non-prod and prod clusters, so you can test platform upgrades safely
  • You want to isolate certain special workloads that don’t play well in shared clusters (for example vendor software that needs privileged node access or custom kernel modules)
  • At some point, you decide a shared cluster has become too big to manage (and too big to fail), so you want to split it up

If we’re at it, why not give every team or project its own cluster? This doesn’t scale:

  1. Each cluster has a fixed minimum resource footprint (at least three control plane nodes and more than one worker node), which makes small clusters very inefficient
  2. Most developer teams don’t have the skills to operate clusters, so your central platform team would have to run all of them, and they won’t be able to manage hundreds or thousands of clusters

Most companies therefore end up with something between ten and a few hundred clusters, organized like this:

Kubernetes-Overview

There are tools that can improve Kubernetes’ multi-tenancy (like OpenShift, vCluster, or Capsule), which can help reduce the number of clusters you need.

But even with these, you should plan from the beginning to run more than one cluster. Thus, you best invest early in automation.

Hosted Control Planes – the Solution?

Recently, a new trend has emerged in the Kubernetes community: Hosted Control Planes, running Kubernetes control planes inside Kubernetes itself.

This approach is elegant because it uses Kubernetes’ own capabilities to solve many of the challenges we discussed:

  • Orchestration and lifecycle management
  • Load balancing and failover
  • Scaling and monitoring
  • Multi-cluster operations

Some examples:

  • Kamaji: open source, automates the full lifecycle of Kubernetes control planes inside Kubernetes
  • HyperShift: Red Hat’s move towards hosted control planes in OpenShift

Hint: If you want to dive into the details of hosted control planes, check out our dedicated blog post on HCPs.

Using the Hosted Control Plane model can reduce operational effort once you have a management cluster in place. In on-prem environments, this introduces a bootstrapping problem:

How do you deploy and manage that first “host” cluster? The one that will run all the others?
Hosted Control Planes Concept

That initial cluster still has to be provisioned, scaled, monitored, and upgraded like any normal cluster.

Summary

Running Kubernetes on-premises is hard. Essentially, you have to run everything the cloud normally hides from you:

  • Control Plane: API server, scheduler, controller manager, load balancers, etcd backups, quorum handling
  • Worker Nodes: OS patching, scaling, and capacity planning
  • Day 2 ops: constant upgrades, monitoring, certificate rotations, and incident response
  • Cluster sprawl: weak multi-tenancy means dozens or hundreds of clusters

This is why many on-prem Kubernetes projects underestimate the effort. And why choosing the right platform matters.

Continue reading

This article is part of a series. In the next part, we’ll look at which on-prem Kubernetes stacks are out there, and how they compare.

  • Part 1: Has the Cloud Delivered on Its Promise?
  • Part 2: Why on-prem Kubernetes is hard (This post)
  • Part 3: Choosing your on-prem Kubernetes stack (coming soon)
  • Part 4: Taming the network jungle (coming soon)
  • Part 5: Dude, where is my storage? (coming soon)
  • Part 6: Bringing Data(bases) into Kubernetes (coming soon)
  • Part 7: Running VM workloads on Kubernetes (coming soon)