You've decided a gateway makes sense — more than one provider, one OpenAI-compatible endpoint, one bill. Now the harder question: which one, without ending up on a re-wrapped, token-inflating, flaky reseller that's 80% under list because the capacity vanishes overnight. Here's the checklist we'd use, across the five things that actually matter — and a one-minute script to test the claims instead of trusting them.
1. Authenticity — is it the genuine model?
The model string is the easiest thing to fake. A reseller can serve a smaller model, a fine-tune, or your prompt wrapped in a fixed template behind claude-sonnet-4-6. Verify capabilities a downgrade can't fake: full context window, native tool calls, vision. The model-authenticity post has the probes.
- Does the model hold its full claimed context (needle-in-haystack at 150K+)?
- Do native tools and vision work, or are they faked as text?
- Is the model traceable to a first-party source (Bedrock, Vertex), or unexplained?
2. Billing honesty — does the meter tell the truth?
You pay per token, and the gateway reports the count. Padding it — a hidden injected system prompt, or a fabricated usage object — is the quietest way to overcharge you 5–25×. Test it in 20 lines (the token-inflation post).
- Do reported tokens match your actual text, plus a small fixed overhead?
- Are failed 4xx/5xx calls free, or do you pay for errors?
- Is prompt caching honored — real cache hits at the reduced rate?
3. Reliability — will it stay up under load?
A gateway adds a hop; it has to earn that by being more reliable than the upstream, not less. Look for fail-fast behavior (errors you can retry, not 90-second hangs) and routing that moves off a degrading backend.
- Is there a public status page and a published SLA number?
- Cross-provider failover, or one upstream per call?
- Does it fail fast and loud so your retries work, or hang silently?
4. Coverage — one key for the work you actually do?
- Chat, image and video on the same key, or just text?
- OpenAI-compatible across all of it — streaming, tools, vision, JSON mode?
- Native routes where they matter (e.g. Anthropic
/v1/messages) kept intact?
5. Price & terms — cheap for a reason, or fairly priced?
Price is where the trap is baited. A modest discount under official list is a margin on volume infrastructure. A gateway 80% under list is reselling gray-market capacity that disappears — the savings come from somewhere, and it's usually authenticity or stability.
- Priced per model against the official rate (auditable), not a vague blanket discount?
- Pay-as-you-go, or locked behind subscriptions and expiring credits?
- Does the balance expire? Are there minimums?
The one-minute due-diligence run
Don't take the marketing page's word — or ours. Run the checks:
# gateway_due_diligence.py
# Run against any OpenAI-compatible gateway before you trust it in production.
# Five checks, well under a cent, about a minute.
import tiktoken
from openai import OpenAI
client = OpenAI(api_key="sk-...", base_url="https://api.brievio.com/v1")
# 1. HONEST BILLING — reported prompt_tokens vs a local tokenizer count.
msgs = [{"role": "user", "content": "Reply with the single word: ok."}]
r = client.chat.completions.create(model="claude-sonnet-4-6", messages=msgs, max_tokens=5)
local = len(tiktoken.get_encoding("cl100k_base").encode(msgs[0]["content"]))
print("token ratio (want ~1.0-1.6x):", round(r.usage.prompt_tokens / local, 1))
# 2. GENUINE MODEL — a real structured tool_call, not JSON jammed into text.
r = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "weather in Tokyo? use the tool"}],
tools=[{"type": "function", "function": {"name": "get_weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}],
)
print("returns real tool_calls:", r.choices[0].message.tool_calls is not None)
# 3. FAILED CALLS FREE — send a deliberately bad request, then check your usage
# page: a 4xx/5xx should cost nothing.
# 4. CONTEXT — needle-in-haystack at the model's claimed window (see the
# "is your Claude really Claude" post for the snippet).
# 5. TERMS — a public status page + published SLA exist; pricing is per-model
# against the official rate, not a vague "80% off everything".Where Brievio lands on its own checklist
Being honest about our own scorecard: Brievio routes the genuine first-party models over tier-1 cloud channels with full context and native features intact; bills true token counts and charges nothing on failed calls; prices each model about 15% under official list (image and video deeper), pay-as-you-go, balance that doesn't expire. Where we don't win: going direct still beats us on day-one model access and provider-of-record contracts, and OpenRouter covers a far wider open-source long tail. See the full comparisons and pricing, then run the script above against whichever gateways are on your shortlist.
The whole point of a checklist is that you can apply it to everyone, including the vendor who wrote it. Apply it to us.