Cluster Autoscaler for Hetzner Cloud
This guide explains how to configure cluster-autoscaler for automatic node scaling in Hetzner Cloud with Talos Linux.
Prerequisites
- Hetzner Cloud account with API token
hcloudCLI installed- Existing Talos Kubernetes cluster
- Networking Mesh and Local CCM configured
Step 1: Create Talos Image in Hetzner Cloud
Hetzner doesn’t support direct image uploads, so we need to create a snapshot via a temporary server.
1.1 Generate Schematic ID
Create a schematic at factory.talos.dev with required extensions:
curl -s -X POST https://factory.talos.dev/schematics \
-H "Content-Type: application/json" \
-d '{
"customization": {
"systemExtensions": {
"officialExtensions": [
"siderolabs/qemu-guest-agent",
"siderolabs/amd-ucode",
"siderolabs/amdgpu-firmware",
"siderolabs/bnx2-bnx2x",
"siderolabs/drbd",
"siderolabs/i915-ucode",
"siderolabs/intel-ice-firmware",
"siderolabs/intel-ucode",
"siderolabs/qlogic-firmware",
"siderolabs/zfs"
]
}
}
}'
Save the returned id as SCHEMATIC_ID.
Note
siderolabs/qemu-guest-agent is required for Hetzner Cloud. Add other extensions
(zfs, drbd, etc.) as needed for your workloads.1.2 Configure hcloud CLI
export HCLOUD_TOKEN="<your-hetzner-api-token>"
1.3 Create temporary server in rescue mode
# Create server (without starting)
hcloud server create \
--name talos-image-builder \
--type cpx22 \
--image ubuntu-24.04 \
--location fsn1 \
--ssh-key <your-ssh-key-name> \
--start-after-create=false
# Enable rescue mode and start
hcloud server enable-rescue --type linux64 --ssh-key <your-ssh-key-name> talos-image-builder
hcloud server poweron talos-image-builder
1.4 Write Talos image to disk
# Get server IP
SERVER_IP=$(hcloud server ip talos-image-builder)
# SSH into rescue mode and write image
ssh root@$SERVER_IP
# Inside rescue mode:
wget -O- "https://factory.talos.dev/image/${SCHEMATIC_ID}/<talos-version>/hcloud-amd64.raw.xz" \
| xz -d \
| dd of=/dev/sda bs=4M status=progress
sync
exit
1.5 Create snapshot and cleanup
# Power off and create snapshot
hcloud server poweroff talos-image-builder
hcloud server create-image --type snapshot --description "Talos <talos-version>" talos-image-builder
# Get snapshot ID (save this for later)
hcloud image list --type snapshot
# Delete temporary server
hcloud server delete talos-image-builder
Step 2: Create Hetzner vSwitch (Optional but Recommended)
Create a private network for communication between nodes:
# Create network
hcloud network create --name cozystack-vswitch --ip-range 10.100.0.0/16
# Add subnet for your region (eu-central covers FSN1, NBG1)
hcloud network add-subnet cozystack-vswitch \
--type cloud \
--network-zone eu-central \
--ip-range 10.100.0.0/24
Step 3: Create Talos Machine Config
From your cluster repository, generate a worker config file:
talm template -t templates/worker.yaml --offline --full > nodes/hetzner.yaml
Then edit nodes/hetzner.yaml for Hetzner workers:
- Add Hetzner location metadata (see
Networking Mesh):
machine: nodeAnnotations: kilo.squat.ai/location: hetzner-cloud kilo.squat.ai/persistent-keepalive: "20" nodeLabels: topology.kubernetes.io/zone: hetzner-cloud - Set public Kubernetes API endpoint:
Change
cluster.controlPlane.endpointto the public API server address (for examplehttps://<public-api-ip>:6443). You can find this address in your kubeconfig or publish it via ingress. - Remove discovered installer/network sections:
Delete
machine.installandmachine.networksections from this file. - Set external cloud provider for kubelet (see
Local CCM):
machine: kubelet: extraArgs: cloud-provider: external - Fix node IP subnet detection:
Set
machine.kubelet.nodeIP.validSubnetsto your vSwitch subnet (for example10.100.0.0/24). - (Optional) Add registry mirrors to avoid Docker Hub rate limiting:
machine: registries: mirrors: docker.io: endpoints: - https://mirror.gcr.io
Result should include at least:
machine:
nodeAnnotations:
kilo.squat.ai/location: hetzner-cloud
kilo.squat.ai/persistent-keepalive: "20"
nodeLabels:
topology.kubernetes.io/zone: hetzner-cloud
kubelet:
nodeIP:
validSubnets:
- 10.100.0.0/24 # replace with your vSwitch subnet
extraArgs:
cloud-provider: external
registries:
mirrors:
docker.io:
endpoints:
- https://mirror.gcr.io
cluster:
controlPlane:
endpoint: https://<public-api-ip>:6443
All other settings (cluster tokens, CA, extensions, etc.) remain the same as the generated template.
Step 4: Create Kubernetes Secrets
4.1 Create secret with Hetzner API token
kubectl -n cozy-cluster-autoscaler-hetzner create secret generic hetzner-credentials \
--from-literal=token=<your-hetzner-api-token>
4.2 Create secret with Talos machine config
The machine config must be base64-encoded:
# Encode your worker.yaml (single line base64)
base64 -w 0 -i worker.yaml -o worker.b64
# Create secret
kubectl -n cozy-cluster-autoscaler-hetzner create secret generic talos-config \
--from-file=cloud-init=worker.b64
Step 5: Deploy Cluster Autoscaler
Create the Package resource:
apiVersion: cozystack.io/v1alpha1
kind: Package
metadata:
name: cozystack.cluster-autoscaler-hetzner
spec:
variant: default
components:
cluster-autoscaler-hetzner:
values:
cluster-autoscaler:
autoscalingGroups:
- name: workers-fsn1
minSize: 0
maxSize: 10
instanceType: cpx22
region: FSN1
extraEnv:
HCLOUD_IMAGE: "<snapshot-id>"
HCLOUD_SSH_KEY: "<ssh-key-name>"
HCLOUD_NETWORK: "cozystack-vswitch"
HCLOUD_PUBLIC_IPV4: "true"
HCLOUD_PUBLIC_IPV6: "false"
extraEnvSecrets:
HCLOUD_TOKEN:
name: hetzner-credentials
key: token
HCLOUD_CLOUD_INIT:
name: talos-config
key: cloud-init
Apply:
kubectl apply -f package.yaml
Step 6: Test Autoscaling
Create a deployment with pod anti-affinity to force scale-up:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-autoscaler
spec:
replicas: 5
selector:
matchLabels:
app: test-autoscaler
template:
metadata:
labels:
app: test-autoscaler
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: test-autoscaler
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: "100m"
memory: "128Mi"
If you have fewer nodes than replicas, the autoscaler will create new Hetzner servers.
Step 7: Verify
# Check autoscaler logs
kubectl -n cozy-cluster-autoscaler-hetzner logs \
deployment/cluster-autoscaler-hetzner-hetzner-cluster-autoscaler -f
# Check nodes
kubectl get nodes -o wide
# Verify node labels and internal IP
kubectl get node <node-name> --show-labels
Expected result for autoscaled nodes:
- Internal IP from vSwitch range (e.g., 10.100.0.2)
- Label
kilo.squat.ai/location=hetzner-cloud
Configuration Reference
Environment Variables
| Variable | Description | Required |
|---|---|---|
HCLOUD_TOKEN | Hetzner API token | Yes |
HCLOUD_IMAGE | Talos snapshot ID | Yes |
HCLOUD_CLOUD_INIT | Base64-encoded machine config | Yes |
HCLOUD_NETWORK | vSwitch network name/ID | No |
HCLOUD_SSH_KEY | SSH key name/ID | No |
HCLOUD_FIREWALL | Firewall name/ID | No |
HCLOUD_PUBLIC_IPV4 | Assign public IPv4 | No (default: true) |
HCLOUD_PUBLIC_IPV6 | Assign public IPv6 | No (default: false) |
Hetzner Server Types
| Type | vCPU | RAM | Good for |
|---|---|---|---|
| cpx22 | 2 | 4GB | Small workloads |
| cpx32 | 4 | 8GB | General purpose |
| cpx42 | 8 | 16GB | Medium workloads |
| cpx52 | 16 | 32GB | Large workloads |
| ccx13 | 2 dedicated | 8GB | CPU-intensive |
| ccx23 | 4 dedicated | 16GB | CPU-intensive |
| ccx33 | 8 dedicated | 32GB | CPU-intensive |
| cax11 | 2 ARM | 4GB | ARM workloads |
| cax21 | 4 ARM | 8GB | ARM workloads |
Note
Some older server types (cpx11, cpx21, etc.) may be unavailable in certain regions.Hetzner Regions
| Code | Location |
|---|---|
| FSN1 | Falkenstein, Germany |
| NBG1 | Nuremberg, Germany |
| HEL1 | Helsinki, Finland |
| ASH | Ashburn, USA |
| HIL | Hillsboro, USA |
Troubleshooting
Connecting to remote workers for diagnostics
Talos does not allow opening a dashboard directly to worker nodes. Use talm dashboard
to connect through the control plane:
talm dashboard -f nodes/<control-plane>.yaml -n <worker-node-ip>
Where <control-plane>.yaml is your control plane node config and <worker-node-ip> is
the Kubernetes internal IP of the remote worker.
Nodes not joining cluster
- Check VNC console via Hetzner Cloud Console or:
hcloud server request-console <server-name> - Common errors:
- “unknown keys found during decoding”: Check Talos config format.
nodeLabelsgoes undermachine,nodeIPgoes undermachine.kubelet - “kubelet image is not valid”: Kubernetes version mismatch. Use kubelet version compatible with your Talos version
- “failed to load config”: Machine config syntax error
- “unknown keys found during decoding”: Check Talos config format.
Nodes have wrong Internal IP
Ensure machine.kubelet.nodeIP.validSubnets is set to your vSwitch subnet:
machine:
kubelet:
nodeIP:
validSubnets:
- 10.100.0.0/24
Scale-up not triggered
- Check autoscaler logs for errors
- Verify RBAC permissions (leases access required)
- Check if pods are actually pending:
kubectl get pods --field-selector=status.phase=Pending
Registry rate limiting (403 errors)
Add registry mirrors to Talos config:
machine:
registries:
mirrors:
docker.io:
endpoints:
- https://mirror.gcr.io
registry.k8s.io:
endpoints:
- https://registry.k8s.io
Scale-down not working
The autoscaler caches node information for up to 30 minutes. Wait or restart autoscaler:
kubectl -n cozy-cluster-autoscaler-hetzner rollout restart \
deployment cluster-autoscaler-hetzner-hetzner-cluster-autoscaler