Kubernetes Provisioning with OpenTofu
Deployment Process
Before you begin deployment, ensure your SSH key is loaded:
eval $(ssh-agent) && ssh-add ~/.ssh/id_rsa
Deployment Process
- OpenTofu reads configurations
- Downloads Talos images
- Creates Proxmox VMs
- Applies node configs
- Bootstraps first control plane
- Generates kubeconfig
- Verifies cluster health
Maintenance Tasks
Version Upgrades
-
Update versions in
main.tf
or relatedtfvars
files. Note that Talos versions can be specified in multiple places:- For the Talos image factory (e.g.,
module "talos" { talos_image = { version = "vX.Y.Z" } }
) - For the machine configurations and cluster secrets (e.g.,
module "talos" { cluster = { talos_version = "vX.Y.Z" } }
) - Kubernetes version (e.g.,
module "talos" { cluster = { kubernetes_version = "vA.B.C" } }
)
Example snippet from
main.tf
(actual structure can vary based on module inputs):module "talos" {
# ...
versions = {
talos = "<see https://github.com/siderolabs/talos/releases>" # Target Talos version
kubernetes = "<see https://github.com/kubernetes/kubernetes/releases>" # Target Kubernetes version
}
# ...
} - For the Talos image factory (e.g.,
-
Set
update = true
for affected nodes intofu/nodes.auto.tfvars
if your OpenTofu module supports this flag for triggering upgrades. Otherwise,tofu apply
will handle changes to version properties. -
Run:
tofu apply
Node Management
Add/Remove Nodes
- Modify the map in
tofu/nodes.auto.tfvars
- Run
tofu apply
Change Resources
- Update node specs in
tofu/nodes.auto.tfvars
- Run
tofu apply
Note: Resource changes can require VM restarts
Initial Setup
Prerequisites
- Proxmox server running 7.4+
- SSH key access configured
- Network DHCP/DNS ready
- Storage pools configured
Configuration
Create config.auto.tfvars
with your environment settings. An example file terraform.tfvars.Example
is provided.
// tofu/config.auto.tfvars example
cluster_name = "talos"
cluster_domain = "kube.pc-tips.se"
# Network settings
# All nodes must be on the same L2 network
network = {
gateway = "10.25.150.1"
vip = "10.25.150.10" # Control plane Virtual IP
cidr_prefix = 24
dns_servers = ["10.25.150.1"]
bridge = "vmbr0"
vlan_id = 150
}
# Proxmox settings
proxmox_cluster = "host3"
# Software versions
versions = {
talos = "v1.10.3"
kubernetes = "1.33.2"
}
# OIDC settings (optional)
oidc = {
issuer_url = "https://sso.pc-tips.se/application/o/kubectl/"
client_id = "kubectl"
}
3. Deployment Steps
- Load your SSH key for Proxmox access:
eval $(ssh-agent) && ssh-add ~/.ssh/id_rsa
Initialize your workspace:
tofu init
Review and apply the configuration:
# Review changes
tofu plan
# Deploy cluster
tofu apply
Set up cluster access:
# Copy kubeconfig to your config directory
cat output/kube-config.yaml > ~/.kube/config
# Verify cluster access
kubectl get nodes
Maintenance Operations
Node Operations
Applying Node Updates
To update a node, follow these steps:
Prepare the node for maintenance:
kubectl cordon node-name
kubectl drain node-name --ignore-daemonsets --delete-emptydir-data
Apply updates via OpenTofu:
tofu apply -target='module.talos.proxmox_virtual_environment_vm.this["node-name"]'
Return the node to service:
kubectl uncordon node-name
Version Upgrades
To upgrade Kubernetes and Talos versions, update the configuration:
cluster = {
kubernetes_version = "<see https://github.com/kubernetes/kubernetes/releases>" # Target K8s version
talos_version = "<see https://github.com/siderolabs/talos/releases>" # Target Talos version
}
Then apply the changes in stages:
# Plan changes
tofu plan -target=module.talos
# Apply updates
tofu apply -target=module.talos
Recovery Operations
State Recovery
If OpenTofu state is lost, follow these steps:
Import existing infrastructure:
tofu import 'module.talos.proxmox_virtual_environment_vm.this["ctrl-00"]' host3/8101
Synchronize the state:
# Refresh state
tofu refresh
# Verify state
tofu plan
Node Recovery
To replace a failed node:
Remove it from the cluster:
kubectl cordon failed-node
kubectl drain failed-node --ignore-daemonsets --delete-emptydir-data
Rebuild using OpenTofu:
# Remove state
tofu taint 'module.talos.proxmox_virtual_environment_vm.this["failed-node"]'
# Recreate node
tofu apply -target='module.talos.proxmox_virtual_environment_vm.this["failed-node"]'
Monitoring and Troubleshooting
Health Checks
To verify cluster health, check the following:
Node status:
kubectl get nodes -o wide
etcd cluster health:
talosctl -n node-ip etcd status
Control plane status:
kubectl get pods -n kube-system
Common Issues
Node Join Problems
Common causes of node join failures:
- Network connectivity issues
- Machine configuration errors
- Bootstrap process failures
API Server Availability
When the API server is unreachable:
- Verify control plane VIP status
- Check etcd cluster health
- Review API server container logs
Resource Management
Monitor these aspects:
- VM resource utilization
- Storage availability and performance
- Network connectivity and throughput