LiteLLM Proxy Configuration

LiteLLM reads its settings from a ConfigMap. The file enables database logging, Redis caching, and Prometheus metrics. A lightweight Redis Service runs in cluster for caching and router state. It has no persistence or auth.

Configuration

# k8s/applications/ai/litellm/configmap.yaml
data:
  proxy_server_config.yaml: |
    litellm_settings:
      callbacks: ["prometheus"]
      cache: true
      cache_params:
        type: redis
        host: redis.litellm.svc.cluster.local
        port: 6379
    router_settings:
      redis_host: redis.litellm.svc.cluster.local
      redis_port: "6379"
    general_settings:
      store_model_in_db: true
      store_prompts_in_spend_logs: true

The Deployment mounts this file at /app/proxy_server_config.yaml. Probes call /health/readiness and /health/liveliness on port 4001. Prometheus scrapes metrics from /metrics.

Tracking Usage and Spend

Basic Tracking

After making requests, navigate to the Logs section in the LiteLLM UI to view Model, Usage and Cost information.

Per-User Tracking

To track spend and usage for each Open WebUI user, configure both Open WebUI and LiteLLM:

Enable User Info Headers in Open WebUI

Set the following environment variable for Open WebUI to enable user information in request headers:

# k8s/applications/ai/openwebui/webui-statefulset.yaml
env:
  - name: ENABLE_FORWARD_USER_INFO_HEADERS
    value: 'true'

Configure LiteLLM to Parse User Headers

Add the following to your LiteLLM config to specify the request header mapping for user tracking:

# k8s/applications/ai/litellm/configmap.yaml
general_settings:
  user_header_mappings:
    - header_name: X-OpenWebUI-User-Id
      litellm_user_role: internal_user
    - header_name: X-OpenWebUI-User-Email
      litellm_user_role: customer

Custom Spend Tag Headers

You can add custom headers to the request to track spend and usage:

# k8s/applications/ai/litellm/configmap.yaml
litellm_settings:
  extra_spend_tag_headers:
    - 'X-OpenWebUI-User-Id'
    - 'X-OpenWebUI-User-Email'
    - 'X-OpenWebUI-User-Name'

Available Tracking Options

You can use any of the following headers in header_name in user_header_mappings:

X-OpenWebUI-User-Id
X-OpenWebUI-User-Email
X-OpenWebUI-User-Name

These may offer better readability and easier mental attribution when hosting for a small group of users that you know well.

Choose based on your needs, but note that in Open WebUI:

Users can modify their own usernames
Administrators can modify both usernames and emails of any account

Responses API (OpenAI `/responses`)

LiteLLM supports the OpenAI-style Responses API which provides session continuity, stored responses, and advanced streaming features. To enable Requests/Responses API features, update the ConfigMap and Deployment values below.

Session continuity: add optional_pre_call_checks: ["responses_api_deployment_check"] under router_settings in proxy_server_config.yaml (already present in the default ConfigMap). This ensures follow-up requests using previous_response_id route to the same upstream deployment.
Cold storage for prompts: Optional for basic responses API functionality. Set store_prompts_in_cold_storage: true and configure cold_storage_custom_logger and s3_callback_params in the ConfigMap to store request/response text into S3 for auditing, session continuity, and long-term storage. The ConfigMap uses environment variables for S3 credentials — create a litellm-s3-secret Secret with keys bucket_name, region_name, access_key_id, secret_access_key.
Response ID security: by default LiteLLM encrypts response IDs so clients cannot fetch others' responses. To allow cross-access (not recommended for multi-tenant deployments), set disable_responses_id_security: true in the ConfigMap's general_settings or set the DISABLE_RESPONSES_ID_SECURITY env var on the Deployment. The Deployment includes DISABLE_RESPONSES_ID_SECURITY which defaults to false.
Performance optimizations: The config enables Redis caching, Qdrant semantic caching, usage-based routing, and connection pooling for fast responses. For high-throughput deployments, consider increasing database_connection_pool_limit and enabling background_health_checks.

Configuration​

Tracking Usage and Spend​

Basic Tracking​

Per-User Tracking​

Enable User Info Headers in Open WebUI​

Configure LiteLLM to Parse User Headers​

Custom Spend Tag Headers​

Available Tracking Options​

Responses API (OpenAI /responses)​