making ECHO model-agnostic with LiteLLM

A partner asked: “Can you confirm you’re planning to make Dembrane model-agnostic?” Their technical team wanted to move away from OpenAI and Anthropic in favor of sovereign or open-source models. Fair ask when you’re selling to European governments.

At the time we had direct OpenAI and Anthropic API calls scattered throughout the codebase. Switching models meant finding every openai.chat.completions.create() call and rewriting it. Not scalable.

Migrated to LiteLLM. It’s a proxy layer that gives you a unified interface to 100+ LLM providers. Configure model aliases, your application code talks to LiteLLM instead of individual providers.

# Before: hardcoded provider
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...]
)

# After: provider-agnostic through LiteLLM
import litellm
response = litellm.completion(
    model=config.LARGE_LITELLM_MODEL,  # could be anything
    messages=[...]
)

Set up tiered model configs: SMALL_LITELLM, MEDIUM_LITELLM, LARGE_LITELLM, each with their own API keys, base URLs, and versions. Route different tasks to appropriate model sizes. Summarization doesn’t need GPT-4. Quick text cleanup can use a small model. Heavy analysis gets the big one.

Shifted from OpenAI directly to Azure OpenAI hosted in the EU. Same models, EU data residency. Partners who want Mistral, Llama, or whatever can configure it themselves. Experimental features are the only exception, everything in production is fully model-agnostic.

The config file though. Went from ~140 clean lines to 500+ lines of service orchestration. Every new integration added a block: RunPod Whisper endpoints, diarization services, LightRAG models, embedding models, inference models. Feature flags controlling which configs are required. Conditional assertions. Most-edited, most-cursed file in the repo.

LITELLM_MODEL_CONFIGS = {
    'small': 'SMALL_LITELLM',
    'medium': 'MEDIUM_LITELLM',
    'large': 'LARGE_LITELLM',
    'lightrag': 'LIGHTRAG_LITELLM',
    'lightrag_audio': 'LIGHTRAG_LITELLM_AUDIOMODEL',
    'lightrag_text': 'LIGHTRAG_LITELLM_TEXTSTRUCTUREMODEL',
    'lightrag_embedding': 'LIGHTRAG_LITELLM_EMBEDDING',
    'lightrag_inference': 'LIGHTRAG_LITELLM_INFERENCE'
}

There’s a comment in the file that just says “this file is messy and needs a refactor.” It’s been there for months.

Abstraction layers are worth the upfront cost. LiteLLM let us survive three provider migrations (OpenAI to Azure to RunPod for transcription, plus various LLM switches) without rewriting application logic. The config complexity is real technical debt, but it’s contained. Lives in one file, not across the entire codebase.

FYI if you’re building AI products for enterprise or government: model-agnosticism isn’t optional. Your customers will ask. Regulators will demand it. Build the abstraction layer early.