HPK allows HPC users to run their own private Kubernetes "mini Cloud" on a typical HPC cluster and then issue commands to it using Kubernetes-native tools.
To deploy, copy the scripts/ folder contents to your HPC account under ~/hpk/ and run:
cd hpk
sbatch --nodes=3 hpk.slurmThen configure and use kubectl:
export KUBECONFIG=${HOME}/.hpk/kubeconfig
kubectl get nodesUsers run a Slurm command to deploy one rootless container per cluster node, which we call bubble (using the hpk-bubble image). One bubble acts as the Kubernetes control plane, while the others act as worker nodes; together they form the Kubernetes cluster. Each bubble runs an instance of K3s, alongside the HPK-specific kubelet (hpk-kubelet), implemented using the Virtual Kubelet framework.
For external networking, bubbles use slirp4netns, while for internal, overlay networking they run Calico and communicate via BGP-routed paths (each host forwards TCP port 17900 to the bubble to enable peer-to-peer BGP mesh communication).
Inside the bubble, an Apptainer wrapper (hpktainer) is used to spawn "pods" (using the hpk-pause image, derived from hpktainer-base); these are containers that are given unique network addresses in the corresponding Calico subnet and host user application containers.
All pod containers are configured in a bridge-free, point-to-point L3 routing layout at the bubble level. With the proper routing rules, they route traffic directly to pods running in other bubbles (via Calico-routed interfaces over BGP) and the outside world.
The pod network stack is implemented in userspace using a pair of TAP interfaces; one in the nested container and one in the bubble. The pair is connected via two instances of the hpk-net-daemon that forward traffic over a UNIX socket created in a shared folder.
HPK implements a 4-level distributed architecture.
-
Level 1: Host Node (Slurm Worker)
- The physical node managed by Slurm.
- Executes
hpk.slurm, which launches the bubble.
-
Level 2: Bubble (Node Overlay)
- Implemented in the
hpk-bubblecontainer. - An Apptainer instance acting as a virtual node.
- Runs K3s (the base Kubernetes distribution) and Calico (L3 routing and BGP peering). The first bubble, which acts as the Kubernetes control plane, also runs etcd for supporting Calico.
- Runs the local
hpk-kubelet, which registers itself as a node in the K3s cluster. - Connects to other bubbles via a BGP mesh network (Calico, BGP port 17900).
- Implemented in the
-
Level 3: Pod
- Implemented in the
hpk-pausecontainer. - Spawned by
hpk-kubeletviahpktainer. - Each Pod is an Apptainer container with its own network namespace connected to the Bubble's point-to-point routing interface.
- The Pod's entrypoint is the
hpk-pausebinary, which acts as a "pause container" to hold the network namespace and capture application container signals.
- Implemented in the
-
Level 4: Application Container
- User application containers spawned by
hpk-pause. - These run within the same network namespace as the Level 3 Pod.
- They share the Pod's IP address and can communicate over
localhost.
- User application containers spawned by
All binaries are built and embedded in container images. The deployment script uses these images.
To build, run:
makeThis uses docker buildx to build and push the images with multi-architecture support (amd64/arm64) to the configured registry (default: docker.io/chazapis). You can override the registry:
REGISTRY=myregistry.io/user makeNote for developers: You can also build the binaries locally for testing purposes using make binaries. These will be placed in bin/.
You can test the setup locally using the provided Vagrant environment, which simulates a multi-node cluster using VMs.
This creates a 2-node cluster (controller, node) running Ubuntu 24.04 with Slurm pre-installed.
cd vagrant
vagrant up
vagrant reload # Required to apply security settings (AppArmor disable)The VMs use mDNS for networking and are accessible as controller.local and node.local.
Option A: For production testing (using published images)
Upload the project scripts to the controller node:
# From the repository root on your host
ssh -o StrictHostKeyChecking=no vagrant@controller.local "mkdir -p ~/hpk" # Password is 'vagrant'
scp -r -o StrictHostKeyChecking=no scripts/* vagrant@controller.local:~/hpk/ # Password is 'vagrant'Option B: For development (using local images)
For rapid iteration during development, build and deploy images directly to the VMs:
make developThis will:
- Build all images locally for your current architecture
- Export them as
.tarfiles - Copy them to both VMs at
~/.hpk/images/ - Copy the
scripts/directory to the controller at~/hpk/ - Remove old
.siffiles to ensure fresh builds are used
To use the local images, set HPK_DEV=1 before running the cluster (see step 3).
Connect to the controller and submit the Slurm job:
ssh -o StrictHostKeyChecking=no vagrant@controller.local # Password is 'vagrant'
export HPK_DEV=1 # If using development mode/local images
cd ~/hpk
sbatch --nodes=2 hpk.slurmThis will launch one controller bubble and one node bubble on the Vagrant VMs.
Once running, you can connect to the controller bubble:
apptainer shell instance://bubble1Inside the bubble, you can run nested containers:
hpktainer run docker://docker.io/chazapis/hpktainer-base:latest /bin/shAnd verify connectivity:
ip addr show tap0 # Should show Calico IP
ping 8.8.8.8 # External access