Talos Upgrade Process

Overview

This upgrade process uses a controlled index-based approach to upgrade Talos nodes sequentially. The upgrade system ensures only one node is upgraded at a time, maintaining cluster stability.

Upgrade Sequence

The upgrade sequence is automatically derived from your node configuration:

Control Plane Nodes (upgraded first for quorum safety)
- ctrl-00 (index 0)
- ctrl-01 (index 1)
- ctrl-02 (index 2)
Worker Nodes (upgraded after control plane)
- work-00 (index 3)
- work-01 (index 4)
- work-02 (index 5)

Upgrade Process

Configure Version

Set the version to upgrade to in main.tf under the talos_image block. The update_version value only applies when update = true is set for a node in tofu/nodes.auto.tfvars.

talos_image = {
  version         = "<see https://github.com/siderolabs/talos/releases>"
  update_version  = "<see https://github.com/siderolabs/talos/releases>" # renovate: github-releases=siderolabs/talos
  schematic_path  = "talos/image/schematic.yaml.tftpl"
}

Note: You can commit and push this change if you're using GitOps automation, or run it locally via CLI if applying manually.

Start Upgrade

tofu apply -var 'upgrade_control={enabled=true,index=0}'

Check Progress

tofu output upgrade_info

Continue to Next Node

Use the exact command shown in the output above, or:

tofu apply -var 'upgrade_control={enabled=true,index=1}'

Finish Upgrade

tofu apply -var 'upgrade_control={enabled=false,index=-1}'

Finalize Versions

After all nodes have been upgraded, update your cluster configuration in main.tf:

# Update the base `talos_image` version to match `update_version`
talos_image = {
  version         = "<see https://github.com/siderolabs/talos/releases>"  # Updated base image version
  update_version  = "<see https://github.com/siderolabs/talos/releases>"
  schematic_path  = "talos/image/schematic.yaml.tftpl"
}

# Update the cluster Talos version
cluster = {
  name               = "talos"
  talos_version      = "<see https://github.com/siderolabs/talos/releases>"
  proxmox_cluster    = "kube"
  kubernetes_version = "<see https://github.com/kubernetes/kubernetes/releases>"
}

Detailed Steps

Monitor Node Upgrade

# Check Talos version
talosctl version --nodes 10.25.150.11

# Check Kubernetes node status
kubectl get nodes ctrl-00

# Wait for node to be Ready
kubectl wait --for=condition=Ready node/ctrl-00 --timeout=300s

Update Base Version

After completing all upgrades, update the base version in main.tf:

# Update version references to the new release

Emergency Rollback

If issues occur during upgrade, disable upgrade mode immediately:

# This will stop the upgrade and prevent unintended changes
tofu apply -var 'upgrade_control={enabled=false,index=-1}'

Monitoring Commands

# Check all node versions
talosctl version --nodes 10.25.150.11,10.25.150.12,10.25.150.13,10.25.150.21,10.25.150.22,10.25.150.23

# Check cluster health
kubectl get nodes -o wide

# Check upgrade progress
tofu output upgrade_info

Notes

Never skip nodes in the sequence - always upgrade sequentially
Wait for each node to be fully Ready before proceeding
Control plane nodes are upgraded first to maintain quorum
The upgrade system prevents multiple nodes from upgrading simultaneously

Overview​

Upgrade Sequence​

Upgrade Process​

Configure Version​

Start Upgrade​

Check Progress​

Continue to Next Node​

Finish Upgrade​

Finalize Versions​

Detailed Steps​

Monitor Node Upgrade​

Update Base Version​

Emergency Rollback​

Monitoring Commands​

Notes​