Kubernetes Cluster Configuration
This guide provides an overview of our Kubernetes cluster architecture and configuration management approach using GitOps with ArgoCD.
Architecture Overview
Our infrastructure follows a strict GitOps approach with the following principles:
- Base configurations containing common settings and defaults
- Environment-specific overlays for customized deployments
- Progressive deployment through ArgoCD sync waves
- Resource graduation across environments when applicable
Network Architecture
Our modern networking stack is built on:
-
Cilium (v1.17+)
- CNI for network connectivity
- Service mesh capabilities
- Network policies
- Cluster mesh (future capability)
-
Gateway API
- Modern ingress management
- Replaces traditional Ingress resources
- Enhanced traffic routing capabilities
Gateway Classes and Structure
Gateway Classes:
external: # Internet-facing services
- HTTP/HTTPS routes
- Load balancing
- External DNS integration
internal: # Cluster-local services
- Internal DNS resolution
- Service mesh integration
- Cross-namespace communication
tls-passthrough: # Direct TLS termination
- Secure services
- Certificate management via cert-manager
Node Management
Safe Node Drainage and Reboot
To safely reboot a node, follow these steps:
- Cordon the node to prevent new workloads:
kubectl cordon node-name
- Drain workloads (ignore DaemonSets, handle ephemeral storage):
kubectl drain node-name --ignore-daemonsets --delete-emptydir-data
- Reboot using talosctl (replace IP with your node's IP):
talosctl reboot --nodes 10.25.150.21
- Uncordon the node after it's back online:
kubectl uncordon node-name
Node Maintenance Best Practices
- Always drain nodes before maintenance
- Verify pod rescheduling before proceeding
- Monitor node health after maintenance
- Ensure cluster has capacity for workload redistribution
- Consider impact on stateful applications