LiteLLM Proxy Configuration
LiteLLM reads its settings from a ConfigMap. The file enables database logging, Redis caching, and Prometheus metrics. A lightweight Redis Service runs in cluster for caching and router state. It has no persistence or auth.
Configuration
# k8s/applications/ai/litellm/configmap.yaml
data:
proxy_server_config.yaml: |
litellm_settings:
callbacks: ["prometheus"]
cache: true
cache_params:
type: redis
host: redis.litellm.svc.kube.pc-tips.se
port: 6379
router_settings:
redis_host: redis.litellm.svc.kube.pc-tips.se
redis_port: "6379"
general_settings:
store_model_in_db: true
store_prompts_in_spend_logs: true
The Deployment mounts this file at /app/proxy_server_config.yaml. Probes call /health/readiness and /health/liveliness on port 4001. Prometheus scrapes metrics from /metrics.
Tracking Usage and Spend
Basic Tracking
After making requests, navigate to the Logs section in the LiteLLM UI to view Model, Usage and Cost information.
Per-User Tracking
To track spend and usage for each Open WebUI user, configure both Open WebUI and LiteLLM:
Enable User Info Headers in Open WebUI
Set the following environment variable for Open WebUI to enable user information in request headers:
# k8s/applications/ai/openwebui/webui-statefulset.yaml
env:
- name: ENABLE_FORWARD_USER_INFO_HEADERS
value: 'true'
Configure LiteLLM to Parse User Headers
Add the following to your LiteLLM config to specify the request header mapping for user tracking:
# k8s/applications/ai/litellm/configmap.yaml
general_settings:
user_header_mappings:
- header_name: X-OpenWebUI-User-Id
litellm_user_role: internal_user
- header_name: X-OpenWebUI-User-Email
litellm_user_role: customer
Custom Spend Tag Headers
You can add custom headers to the request to track spend and usage:
# k8s/applications/ai/litellm/configmap.yaml
litellm_settings:
extra_spend_tag_headers:
- "X-OpenWebUI-User-Id"
- "X-OpenWebUI-User-Email"
- "X-OpenWebUI-User-Name"
Available Tracking Options
You can use any of the following headers in header_name in user_header_mappings:
X-OpenWebUI-User-IdX-OpenWebUI-User-EmailX-OpenWebUI-User-Name
These may offer better readability and easier mental attribution when hosting for a small group of users that you know well.
Choose based on your needs, but note that in Open WebUI:
- Users can modify their own usernames
- Administrators can modify both usernames and emails of any account
Responses API (OpenAI /responses)
LiteLLM supports the OpenAI-style Responses API which provides session continuity, stored responses, and advanced streaming features. To enable Requests/Responses API features, update the ConfigMap and Deployment values below.
-
Session continuity: add
optional_pre_call_checks: ["responses_api_deployment_check"]underrouter_settingsinproxy_server_config.yaml(already present in the default ConfigMap). This ensures follow-up requests usingprevious_response_idroute to the same upstream deployment. -
Cold storage for prompts: Optional for basic responses API functionality. Set
store_prompts_in_cold_storage: trueand configurecold_storage_custom_loggerands3_callback_paramsin the ConfigMap to store request/response text into S3 for auditing, session continuity, and long-term storage. The ConfigMap uses environment variables for S3 credentials — create alitellm-s3-secretSecret with keysbucket_name,region_name,access_key_id,secret_access_key. -
Response ID security: by default LiteLLM encrypts response IDs so clients cannot fetch others' responses. To allow cross-access (not recommended for multi-tenant deployments), set
disable_responses_id_security: truein the ConfigMap'sgeneral_settingsor set theDISABLE_RESPONSES_ID_SECURITYenv var on the Deployment. The Deployment includesDISABLE_RESPONSES_ID_SECURITYwhich defaults tofalse. -
Performance optimizations: The config enables Redis caching, Qdrant semantic caching, usage-based routing, and connection pooling for fast responses. For high-throughput deployments, consider increasing
database_connection_pool_limitand enablingbackground_health_checks.