cd ../back to blog
$Guide//June 4, 2026//7 min read

Migrating from OpenRouter to a first-party-grade gateway

OpenRouter wins on catalog breadth; a first-party-grade gateway wins on genuine models, honest token billing and multimodal. When each fits, plus the one-line migration.

Let's start with the fair part: OpenRouter is the breadth leader. One key, one base URL, and you reach 300+ models — including the open-source long tail (Llama, Qwen, DeepSeek, Mistral, and dozens of community fine-tunes) that no first-party API exposes. If your job is to evaluate a wide field, route around outages across providers, or reach a niche open-weights model, that catalog is a genuine, hard-to-replicate strength. This post is not a takedown.

It's a when-each-fits guide. There's a different set of jobs — ones where you want the genuine first-party flagship with its native features intact, billing you can audit token by token, image and video on the same key, and a per-model discount that's small enough to be real. That's where teams add a first-party-grade gateway like Brievio alongside OpenRouter, or move the production path to it entirely. The migration itself is a one-line base_url change plus a model-slug rename. Here's the honest decision and the mechanics.

When OpenRouter is the right call

Be clear-eyed about this before you change anything. OpenRouter is the better fit when:

  • You need the open-source long tail. Llama, Qwen, DeepSeek, Mistral, and community fine-tunes aren't on any first-party API. If your roadmap touches open-weights models, you want that catalog.
  • You're running a wide eval. Comparing twenty models across five providers is exactly what a 300-model aggregator is built for. One key, one schema, every model.
  • You want cross-provider failover as a feature. OpenRouter can fall back across upstream providers for a given model. For some availability profiles that breadth of routing is the point.
  • You're price-shopping the cheapest host of a given open model. The marketplace surfaces multiple providers per model and lets you sort by price. That's a real lever.

None of those go away by adding Brievio. Many teams keep OpenRouter for exactly these jobs and point only the first-party production traffic elsewhere. Two base URLs, one codebase.

When a first-party-grade gateway fits better

The case for switching (or splitting traffic) is narrower and specific. Reach for it when these matter:

  • You need the genuine first-party model, with native features intact. Not a fine-tune or a near-equivalent — the actual Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5 and Gemini 2.5 Pro / Flash, with native tool use, vision, and prompt caching working as the provider documents them. When a flagship's behavior is your product, "close enough" isn't.
  • You want honest token billing you can audit. Your real cost is rate × tokens, and the token count is the easy number to inflate. Brievio reports genuine first-party token counts and bills them straight. If you've never checked, it's a twenty-line self-test.
  • You want image and video on the same key. Nano Banana and Nano Banana Pro, GPT-Image-2, and Veo 3 sit behind the same credentials and the same OpenAI-shaped client as your chat models — no second provider account, no separate billing surface.
  • You want a discount that's small enough to trust. Chat models run about 15% under official list, image and video roughly 37.5% under, published per model against the official reference rate so you can audit the spread. A modest, explainable discount is a margin on volume — not a subsidy that has to be clawed back somewhere you can't see.
  • Failed calls are free. 4xx and 5xx responses aren't billed, so a bad day for an upstream doesn't quietly become a line on your invoice.

The migration: one base_url, one key

Because both gateways implement the OpenAI Chat Completions API, the switch is the smallest diff in your codebase. You point the SDK at a new host and swap the key. Streaming, function calling, JSON mode, and your request shapes are untouched:

swap_base_url.py
# Migrating off OpenRouter is a two-line diff: swap the base_url and the key.
# Everything else — the OpenAI SDK, your request shapes, streaming, tools —
# stays exactly the same, because both speak the OpenAI Chat Completions API.

from openai import OpenAI

# --- Before: OpenRouter ---
client = OpenAI(
    api_key="sk-or-...",
    base_url="https://openrouter.ai/api/v1",
)

# --- After: Brievio ---
client = OpenAI(
    api_key="sk-brievio-...",
    base_url="https://api.brievio.com/v1",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",                 # see model-name mapping below
    messages=[{"role": "user", "content": "Summarize this contract clause…"}],
)

If you came to OpenRouter from the official OpenAI SDK in the first place, this will feel familiar — it's the same move as our ten-minute migration from OpenAI, just with a different origin. No new SDK, no rewrite of your call sites.

The one thing you change: model slugs

Here's the single behavioral difference worth a few minutes of care. OpenRouter namespaces every model as vendor/modelanthropic/claude-sonnet-4.6, google/gemini-2.5-flash — because with 300+ models across many providers, namespacing prevents collisions. Brievio serves a curated first-party set, so the slugs are plain: claude-sonnet-4-6, gemini-2.5-flash. No vendor/ prefix.

model_mapping.py
# The one thing you DO touch: model slugs.
# OpenRouter namespaces every model as "vendor/model" so 300+ models from
# many providers never collide. Brievio serves a curated first-party set,
# so the slugs are plain — no "vendor/" prefix.

MODEL_MAP = {
    # OpenRouter slug                ->  Brievio slug
    "anthropic/claude-opus-4.7":         "claude-opus-4-7",
    "anthropic/claude-sonnet-4.6":       "claude-sonnet-4-6",
    "anthropic/claude-haiku-4.5":        "claude-haiku-4-5",
    "google/gemini-2.5-pro":             "gemini-2.5-pro",
    "google/gemini-2.5-flash":           "gemini-2.5-flash",
    # Image + video live on the SAME key — no separate provider account:
    # "nano-banana", "nano-banana-pro", "gpt-image-2", "veo-3"
}

def to_brievio(slug: str) -> str:
    return MODEL_MAP.get(slug, slug.split("/")[-1])

# Pragmatic shortcut for Claude/Gemini: drop the prefix and dot-to-dash the
# version. "anthropic/claude-sonnet-4.6" -> "claude-sonnet-4-6". Confirm each
# against /models so you fail loudly on a typo instead of silently routing wrong.

For Claude and Gemini the rule is mechanical: drop the prefix, and render the version with dashes instead of a dot (4.64-6). Keep a small mapping dict rather than string-munging blindly, and validate every slug against the model list so a typo fails loudly at startup instead of silently routing to the wrong place. This is the only find-and-replace the migration actually requires.

A pragmatic rollout

You don't have to choose all at once. The lowest-risk path treats the two gateways as complements:

  • Run them side by side. Keep OpenRouter pointed at the open-source long tail and your eval harness. Add Brievio as a second client for the first-party flagships in your production path. Two base URLs, one repo.
  • Shadow before you cut over. Mirror a slice of live traffic to Brievio and diff the responses and the usage objects. You're checking that the model is the genuine first-party one and that token counts line up with what you expect — the token self-test is exactly this check, automated.
  • Move the multimodal work first. If you're stitching together a separate image or video provider today, consolidating Nano Banana / GPT-Image-2 / Veo 3 onto one key is often the cleanest early win — fewer accounts, one bill, one auth path.
  • Promote when the diff is boring. Once shadowed traffic matches and the numbers reconcile, flip the production default. Roll back by reverting one base_url if anything surprises you.

The honest takeaway

OpenRouter and a first-party-grade gateway aren't really competing for the same job. OpenRouter optimizes for reach — the widest catalog and the open-source long tail under one key, which is a real and valuable thing. Brievio optimizes for fidelity on a curated set — the genuine first-party flagships with native features intact, honest auditable token billing, image and video on the same key, and a per-model discount small enough to trust. Both sit behind a single OpenAI-compatible base_url, which is precisely why so many teams run both: breadth where they need breadth, fidelity where it counts.

If you want the genuine first-party models with their native behavior and billing you can verify, the move costs you one base_url, one key, and a model-slug rename. Read the side-by-side on Brievio vs OpenRouter, browse the exact models and slugs on the model list, and run the token self-test against both before you put real traffic anywhere. The decision survives the scrutiny either way — which is the only kind of decision worth shipping.