Kubernetes Node Management
This example manages the OS-level configuration of machines running Kubernetes — container runtime, kubelet, kernel tuning, CNI, and firewall rules. A rolling drain-and-reboot workflow handles OS patching without cluster disruption.
Vigo manages the OS layer underneath Kubernetes. It does not manage Kubernetes resources (deployments, configmaps, services) — that's the domain of kubectl apply and GitOps tools like ArgoCD or Flux.
Containerd Module
Installs and configures the container runtime.
stockpile/modules/k8s/containerd.vgo:
name: k8s-containerd
vars:
containerd_version: "1.7.*"
resources:
- name: containerd-package
type: package
package: containerd.io
state: present
- name: containerd-config-dir
type: directory
path: /etc/containerd
owner: root
group: root
mode: "0755"
- name: containerd-config
type: file
target_path: /etc/containerd/config.toml
owner: root
group: root
mode: "0644"
content: |
version = 2
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.k8s.io/pause:3.10"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
depends_on:
- containerd-config-dir
notify:
- containerd-service
- name: containerd-service
type: service
service: containerd
state: running
enabled: true
depends_on:
- containerd-package
Kubelet Module
Configures the kubelet daemon and its systemd unit.
stockpile/modules/k8s/kubelet.vgo:
name: k8s-kubelet
depends_on:
- k8s-containerd
vars:
kubelet_cluster_dns: "10.96.0.10"
kubelet_cluster_domain: "cluster.local"
kubelet_max_pods: "110"
resources:
- name: k8s-packages-debian
type: package
package: kubelet
state: present
when: "os_family('debian')"
- name: k8s-packages-rhel
type: package
package: kubelet
state: present
when: "!os_family('debian')"
- name: kubeadm-package-debian
type: package
package: kubeadm
state: present
when: "os_family('debian')"
- name: kubeadm-package-rhel
type: package
package: kubeadm
state: present
when: "!os_family('debian')"
- name: kubectl-package-debian
type: package
package: kubectl
state: present
when: "os_family('debian')"
- name: kubectl-package-rhel
type: package
package: kubectl
state: present
when: "!os_family('debian')"
- name: kubelet-config
type: file
target_path: /var/lib/kubelet/config.yaml
owner: root
group: root
mode: "0644"
content: |
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- {{ .Vars.kubelet_cluster_dns }}
clusterDomain: {{ .Vars.kubelet_cluster_domain }}
maxPods: {{ .Vars.kubelet_max_pods }}
cgroupDriver: systemd
containerRuntimeEndpoint: unix:///run/containerd/containerd.sock
rotateCertificates: true
serverTLSBootstrap: true
notify:
- kubelet-service
- name: kubelet-service
type: service
service: kubelet
state: running
enabled: true
Kernel Tuning Module
Required kernel parameters and modules for Kubernetes networking and performance.
stockpile/modules/k8s/kernel-tuning.vgo:
name: k8s-kernel-tuning
resources:
- name: br-netfilter-module
type: file
target_path: /etc/modules-load.d/k8s.conf
owner: root
group: root
mode: "0644"
content: |
overlay
br_netfilter
notify:
- load-kernel-modules
- name: load-kernel-modules
type: exec
command: "modprobe overlay && modprobe br_netfilter"
unless: "lsmod | grep -q br_netfilter && lsmod | grep -q overlay"
- name: sysctl-bridge-nf-call-iptables
type: sysctl
key: net.bridge.bridge-nf-call-iptables
value: "1"
depends_on:
- load-kernel-modules
- name: sysctl-bridge-nf-call-ip6tables
type: sysctl
key: net.bridge.bridge-nf-call-ip6tables
value: "1"
depends_on:
- load-kernel-modules
- name: sysctl-ip-forward
type: sysctl
key: net.ipv4.ip_forward
value: "1"
- name: sysctl-overcommit
type: sysctl
key: vm.overcommit_memory
value: "1"
- name: sysctl-conntrack-max
type: sysctl
key: net.netfilter.nf_conntrack_max
value: "131072"
when: "has_kernel_module('nf_conntrack')"
- name: sysctl-inotify-watches
type: sysctl
key: fs.inotify.max_user_watches
value: "524288"
CNI Module
Drops CNI configuration files. The actual CNI plugin (Calico, Cilium, Flannel) is typically deployed via a DaemonSet, but the host-level config directory and any bridge config must exist.
stockpile/modules/k8s/cni.vgo:
name: k8s-cni
depends_on:
- k8s-kernel-tuning
resources:
- name: cni-bin-dir
type: directory
path: /opt/cni/bin
owner: root
group: root
mode: "0755"
- name: cni-config-dir
type: directory
path: /etc/cni/net.d
owner: root
group: root
mode: "0755"
- name: cni-loopback-config
type: file
target_path: /etc/cni/net.d/99-loopback.conf
owner: root
group: root
mode: "0644"
content: |
{
"cniVersion": "1.0.0",
"name": "lo",
"type": "loopback"
}
Firewall Module
Opens the ports required by Kubernetes components.
stockpile/modules/k8s/firewall.vgo:
name: k8s-firewall
vars:
k8s_api_port: "6443"
resources:
# Control plane ports
- name: fw-k8s-api
type: firewall
port: 6443
protocol: tcp
action: accept
when: "has_executable('kubeadm')"
- name: fw-etcd-client
type: firewall
port: 2379
protocol: tcp
action: accept
when: "has_executable('etcd')"
- name: fw-etcd-peer
type: firewall
port: 2380
protocol: tcp
action: accept
when: "has_executable('etcd')"
# Worker node ports
- name: fw-kubelet-api
type: firewall
port: 10250
protocol: tcp
action: accept
- name: fw-nodeport-range
type: firewall
port: "30000:32767"
protocol: tcp
action: accept
Role Definition
stockpile/roles/k8s-node.vgo:
name: k8s-node
modules:
- k8s-kernel-tuning
- k8s-containerd
- k8s-cni
- k8s-kubelet
- k8s-firewall
Node Assignment
stockpile/envoys/nodes.vgo:
envoys:
- match: "k8s-cp-*.example.com"
environment: production
roles: [k8s-node]
vars:
kubelet_max_pods: "110"
- match: "k8s-worker-*.example.com"
environment: production
roles: [k8s-node]
vars:
kubelet_max_pods: "250"
Rolling Drain-and-Reboot Workflow
For OS patching, nodes must be drained before rebooting to avoid pod disruption. This workflow drains each node, reboots it, waits for it to rejoin the cluster, and uncordons it — rolling across the fleet in batches with health checks.
Task Definitions
tasks/k8s-drain.yaml:
name: k8s-drain
description: Drain a Kubernetes node for maintenance
timeout: 300
parameters:
node_name:
type: string
required: true
description: Kubernetes node name to drain
script: |
#!/bin/bash
set -euo pipefail
kubectl drain "$PARAM_NODE_NAME" \
--ignore-daemonsets \
--delete-emptydir-data \
--timeout=240s \
--force
tasks/k8s-uncordon.yaml:
name: k8s-uncordon
description: Uncordon a Kubernetes node after maintenance
timeout: 60
parameters:
node_name:
type: string
required: true
description: Kubernetes node name to uncordon
script: |
#!/bin/bash
set -euo pipefail
kubectl uncordon "$PARAM_NODE_NAME"
Workflow Definition
stockpile/workflows/k8s-rolling-reboot.yaml:
name: k8s-rolling-reboot
description: Drain, reboot, and uncordon Kubernetes nodes in rolling batches
steps:
- name: drain
task: k8s-drain
target: "k8s-worker-*.example.com"
params:
node_name: "{{.hostname}}"
timeout: 300
batch_size: "1"
max_failures: "1"
health_check: "kubectl get nodes -o json | jq -e '[.items[] | select(.status.conditions[] | select(.type==\"Ready\" and .status==\"True\"))] | length > 0'"
- name: reboot
command: "shutdown -r now"
target: "k8s-worker-*.example.com"
timeout: 30
batch_size: "1"
- name: wait-ready
command: "kubectl wait --for=condition=Ready node/$(hostname) --timeout=300s"
target: "k8s-worker-*.example.com"
timeout: 360
batch_size: "1"
health_check: "kubectl get nodes -o json | jq -e '[.items[] | select(.status.conditions[] | select(.type==\"Ready\" and .status==\"True\"))] | length > 0'"
- name: uncordon
task: k8s-uncordon
target: "k8s-worker-*.example.com"
params:
node_name: "{{.hostname}}"
timeout: 60
batch_size: "1"
Run with:
vigocli workflow run k8s-rolling-reboot
Or target a subset:
vigocli workflow run k8s-rolling-reboot --target "k8s-worker-0[1-3].example.com"
Execution Order
The module DAG ensures correct ordering:
- k8s-kernel-tuning — kernel modules and sysctl params (no dependencies)
- k8s-containerd — container runtime (no dependencies)
- k8s-cni — CNI config directories (depends on kernel-tuning)
- k8s-kubelet — kubelet config and service (depends on containerd)
- k8s-firewall — open required ports (no dependencies)
What Vigo Manages vs. What It Doesn't
| Vigo manages (OS layer) | Use other tools for (cluster layer) |
|---|---|
| containerd install + config | Pod deployments, DaemonSets |
| kubelet config + service | ConfigMaps, Secrets (k8s API objects) |
| Kernel params (sysctl, modules) | Helm charts, Kustomize overlays |
| CNI host directories | CNI plugin DaemonSet (Calico, Cilium) |
| Firewall rules for k8s ports | Network policies |
| OS patching + reboot coordination | Cluster upgrades (kubeadm upgrade) |
| Node certificates (host-level PKI) | ServiceAccount tokens, RBAC |