Boban Glisovic

Founder

30 Sep 2025

From Metal to Kubernetes Worker

11 min read

In this post:

Introduction

How we take a bare metal server from power-on to a Kubernetes worker: with immutable images, stateless design, and a failsafe workflow for both bootstrap and day-2 operations.

At meltcloud, we work in environments that begin at the hardware layer. In our previous post, we looked at why Kubernetes on bare metal is compelling and the challenges that come with it.

Here’s how we tackle one of the problems: turning a bare metal server into a Kubernetes worker with immutable images, stateless design, and a failsafe lifecycle.

The goal: apply cloud-native principles all the way down to physical servers, so Kubernetes workers behave like containers: immutable, stateless, and easy to replace.

TL;DR

To run bare-metal Kubernetes workers like containers, we leverage Unified Kernel Images (UKIs) with embedded, machine-specific configuration. This enables atomic updates with automatic rollback. Enrolling new workers only requires booting a minimal ISO, either via physical media, BMC virtual media, or the network.

Why Bare-Metal Kubernetes Needs a Different Approach

At first glance, running Kubernetes workers directly on bare metal seems simple: install an OS, configure a container runtime and the kubelet, and join it to the cluster.

Let’s take a look at the involved components:

Components of a Bare Metal Kubernetes Worker — What makes up a Bare Metal Kubernetes Worker

In practice, keeping all these parts up-to-date, healthy and consistent over time is a demanding operational task: each machine must go through a set of lifecycle steps:

Enrollment: preparing firmware, partitions, and initial installation, ideally in an automated fashion
OS updates: rolling out kernel, systemd, and base dependencies
Kubernetes updates: aligning kubelet and container runtime with the control plane

If handled manually, these tasks often lead to configuration drift. Some servers miss patches, others fail to rejoin after reboot, and debugging becomes painful. Replacing a failed node often takes longer than it should.

Day 2 operations on the Worker — Required day 2 operations

Cloud environments solved this years ago with immutability and golden images: updates are applied by replacing VM images, not patching instances in place.

On bare metal, however, you don’t have a cloud or virtualization layer to abstract image management. Instead, you’re stuck with vendor-specific tooling, home-grown Ansible playbooks or manual patching.

That’s the gap we want to address:

Bringing the same image-based, immutable lifecycle used in the cloud directly to physical servers.

An Immutable, Composable, and Reproducible OS

Let’s start with the first layer:

The operating system is the foundation of any Kubernetes worker. The ecosystem of container-optimized Linux distributions is growing fast, with projects such as Talos Linux, Bottlerocket, Flatcar Container Linux, and meta-distributions like Kairos OS.

All share similar principles: immutability, composability, and reproducibility. Many draw on concepts from the UAPI Group for how to boot and run a modern, stateless Linux OS.

Our guiding principle: the OS should behave like a container.

To achieve that, we need:

A Linux distribution that can run entirely from a ramdisk
→ We use Flatcar which offers immutability, a strong ecosystem, declarative configuration via Ignition, and flexible extension through systemd sysext.
A way to package and version the complete OS
→ We use Unified Kernel Images (UKIs): a full Linux OS packed into a single EFI binary.
Injectable machine-specific configuration
→ UKI addons let us version, sign, and embed per-node configuration, guaranteeing reproducibility.
An automatic way to provision/enroll the servers
→ We automatically enroll servers by booting from an enrollment ISO and installing to disk
Failsafe update & rollback
→ Using systemd-boot with automatic boot assessment, we can update safely and recover automatically.

Why Flatcar?

Flatcar is minimal but has everything we need: a full Linux kernel, plus common drivers and services like NFS, iSCSI, multipathd, and NVIDIA support that are often required in enterprise environments, while still staying immutable and lightweight.

That makes it well suited for Kubernetes workers, which need to stay consistent without losing essential functionality.

Crafting the Machine Image

Now that we have settled on a Linux distribution, we can use Unified Kernel Images (UKIs) to package the Linux operating system and all its dependencies into a single binary:

Composition of the Machine Image UKI binary

The following parts make up the final UKI (machine-image.efi):

vmlinuz.kernel: The upstream Flatcar Linux Kernel
initrd.cpio.gz: The upstream Flatcar initrd (also contains the container runtime)
kubelet.cpio.gz: Kubernetes components, provided as additional initrd.

Building the UKI EFI is straightforward with systemd’s ukify:

ukify \
  --linux kernel.vmlinuz \
  --initrd initrd.cpio.gz \
  --initrd kubelet.cpio.gz \
  --cmdline "console=ttyS0"

Supplying Configuration

Even in a stateless OS model, workers need configuration: network settings, kubelet bootstrap tokens and endpoints, and so on. Flatcar uses Ignition (basically a cloud-init on steroids) to declaratively provide configuration.

While it would be possible to fetch the ignition configuration during boot over the network, this has the following disadvantages:

Dependency on network availability during boot
Configuration cannot be signed and validated against system trust (required for Secure or Trusted Boot)

Instead, we embed configuration as a UKI addon, producing what’s called a USI (Unified System Image): an OS image with its configuration baked in.

The result: Once the server powers on and executes the USI, it directly boots into its operational Kubernetes state, with no further provisioning steps!

Enrollment: Getting USIs onto Bare Metal

Now that we have our USIs ready, how do we get these binaries installed onto the bare metal servers? Unlike in the cloud or with virtualization, there’s no hypervisor or API to inject images into instances.

It turns out that even in 2025, the most supported and reliable way to install a bare metal server is still the same we used for decades when installing a desktop linux distribution:

Booting from a live ISO
Installing the image to disk.

To achieve this, we first create an enrollment .iso that contains a Live Linux system and some installation script to install the images.

Then, the enrollment process starts:

Booting the .iso: this essentially boils down to either attaching a USB stick/BMC virtual media or booting from UEFI HTTP Boot or iPXE (check out our Docs for details)
Linux starts the installation script which creates the EFI System Partition (ESP) on the disk and installs boot loader (systemd-boot) and the USI onto the partition.
Finally, the script sets the boot order to the disk (gladly, we can do this from Linux userspace via efibootmgr) and reboots the server to boot into the new image.

If you are curious how the whole process looks like for our platform, check our Documentation on Enrollment.

Let’s take a look what has been installed into the ESP partition during enrollment:

USI — Contents of EFI system partition (ESP)

We store systemd-boot into the path /EFI/BOOT/bootx64.efi. This is the default path that the UEFI firmware will boot into once the server is powered on.
systemd-boot will then scan the directory /loader/entries for boot entries, sort them and select the first entry, 1.conf.
1.conf points to our machine-image UKI in /images/revision.efi. On execution, it will automatically include the /images/revision.efi.extra.d/config.addon.efi and boot into the final system.

If the boot fails, it wil fall back to 0.conf (pointing to our recovery image), thanks to systemd-boot’s automatic boot assessment.

Atomic Updates and Rollback

With this setup in place, atomic updates become simple:

Updating binaries or config is just writing a new USI to the ESP and pointing the bootloader to it: no in-place changes.
Rollback is as easy as rebooting into a previous revision.

We use a small agent on the server to install the new USIs, but in theory, as it’s just dropping some files, an Ansible playbook or some script could do the same.

This enables safe, unattended updates for bare-metal machines.

A Note on BMC Automation

Projects like MAAS, Tinkerbell, or Metal³ integrate with the BMC APIs directly to automate power cycles and ISO mounting.

In practice, we found this unreliable: BMC APIs vary by vendor, expose only partial features, and require a lot of custom engineering to support for each vendor.

A simpler approach is to stick to one thing every server supports: booting an ISO.

For other tasks like firmware patching, we recommend using BMC APIs, but they’re better treated as out-of-band IaC tasks (for example, by using a vendor-specific Terraform provider or their management tools) rather than part of the core provisioning flow.

Key Takeaways

To wrap it up, here are the approaches that work best for us:

Use image-based OS builds so bare metal nodes stay consistent instead of drifting over time.
Package everything into UKIs/USIs to make each node reproducible and self-contained.
Enroll machines via a bootable ISO, which works reliably across different hardware.
Apply updates as new images with rollback, so upgrades are safe and fully automated.

We’d love to hear how others are herding their bare-metal Kubernetes workers: reach out to us and share your stories!