Skip to main content

Infrastructure Controllers

This document outlines the core infrastructure controllers deployed in our Kubernetes cluster.

Core Controllers

ArgoCD

  • Purpose: GitOps deployment controller
  • Version: Latest stable (automated by Renovate)
  • Features:
    • Application state synchronization
    • Progressive delivery
    • Resource health monitoring
    • Multi-cluster management

cert-manager

  • Purpose: Certificate management
  • Version: Latest v1.x (automated by Renovate)
  • Features:
    • Let's Encrypt integration
    • Internal PKI support
    • Automated renewal
    • CSI driver integration

External Secrets

  • Purpose: Secrets management
  • Version: Latest stable (automated by Renovate)
  • Features:
    • Bitwarden integration
    • Secrets rotation
    • Secure key distribution
    • Cross-namespace secrets

Longhorn

  • Version: 1.8.0+
  • Features:
    • Distributed block storage
    • Volume replication
    • Backup capabilities
    • Storage overprovisioning

Resource Management

Base Resource Configuration

Standard Resource Profile:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi

High-Availability Profile:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 1Gi

Controller-Specific Resources

ControllerCPU RequestMemory RequestCPU LimitMemory Limit
ArgoCD Server500m512Mi2000m1Gi
cert-manager100m256Mi500m512Mi
External Secrets100m256Mi500m512Mi
Longhorn Manager250m512Mi1000m1Gi

High Availability

Deployment Strategy

Replication:
argocd-server: 2
cert-manager: 2
external-secrets: 2
longhorn-manager: 3 # One per node

Pod Disruption Budget:
minAvailable: 1

Node Placement

Topology Spread:
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule

Monitoring Integration

Metrics Collection

All controllers expose Prometheus metrics:

  • ArgoCD: Application sync status, performance metrics
  • cert-manager: Certificate status, renewal metrics
  • External Secrets: Sync status, error rates
  • Longhorn: Volume health, performance metrics

Alert Rules

Critical Alerts:
- ArgoCD application out of sync > 1h
- Certificate renewal failure
- Secrets sync failure
- Volume degradation

Warning Alerts:
- High resource usage
- Slow sync operations
- Certificate expiring soon

Version Management

Update Strategy

  • Automated updates via Renovate
  • Version constraints in Helm values
  • Progressive rollout through environments
  • Regular review of change logs

Current Versions

Controllers:
argocd: 2.10.x
cert-manager: 1.17.x
external-secrets: 0.9.x
longhorn: 1.8.x

Security Configuration

RBAC Settings

  • Minimal permissions model
  • Namespace isolation
  • Service account restrictions
  • Security context constraints

Network Policies

Ingress Rules:
- Allow metrics collection
- Allow API access from authorized sources
- Allow cross-controller communication
- Deny all other traffic

Troubleshooting Guide

Common Issues

  1. Controller Pod Crashes

    • Check resource limits
    • Review recent changes
    • Monitor system resources
  2. Sync Failures

    • Verify Git repository access
    • Check network connectivity
    • Review controller logs
  3. Performance Issues

    • Monitor resource usage
    • Check node capacity
    • Review scaling configuration

Debug Commands

# Check controller status
kubectl -n argocd get pods
kubectl -n cert-manager get pods
kubectl -n external-secrets get pods

# View controller logs
kubectl -n <namespace> logs -l app=<controller-name>

# Verify RBAC
kubectl auth can-i --as system:serviceaccount:<ns>:<sa> <verb> <resource>