Engineering notes and product updates
Notes from the team building Brievio's AI API gateway — the genuine models, reliability engineering, honest billing, and SDK plumbing.
- $Guide//Jun 4, 2026//8 min read
OpenAI-compatible: what actually has to match (and what breaks)
A practical, skeptical guide to OpenAI-compatibility in production — chat shape, SSE streaming, tools, vision, JSON mode, embeddings: what ports cleanly and what leaks.
- $Guide//Jun 4, 2026//7 min read
Tool use with Claude and Gemini through one OpenAI-compatible API
Define tools, read tool_calls, run the multi-turn loop, and handle parallel calls — the same OpenAI shape working across Claude and Gemini behind one base_url.
- $Guide//Jun 4, 2026//7 min read
Streaming Claude, Gemini and GPT with the OpenAI SDK (SSE)
How SSE streaming works through an OpenAI-compatible endpoint: stream=True, deltas, the [DONE] sentinel, include_usage for real token counts, in Python and Node.
- $Guide//Jun 4, 2026//7 min read
Capping AI API spend — per-request and per-user cost control
Bound every call with max_tokens, track per-user spend from the usage object, and set budget cutoffs — and why honest billing makes the math trustworthy.
- $Guide//Jun 4, 2026//7 min read
Migrating from OpenRouter to a first-party-grade gateway
OpenRouter wins on catalog breadth; a first-party-grade gateway wins on genuine models, honest token billing and multimodal. When each fits, plus the one-line migration.
- $Review//Jun 4, 2026//7 min read
Veo 3 Fast vs Quality vs Lite — which video tier for which job
A practical buyer's guide to Brievio's three Veo 3 tiers — real per-video cost ($0.15/$0.25/$1.20), text-to-video + image-to-video code, and when to use each.
- $Trust//Jun 4, 2026//6 min read
Too good to be true: where an 80%-under-list AI gateway's capacity comes from
When an AI API gateway is 80% under official list, the honest question is where the capacity comes from. Four uncomfortable answers — a downgraded model, gray-market supply, an inflated meter, or loss-leader lock-in — and what a discount you can actually trust looks like.
- $Guide//Jun 4, 2026//8 min read
How to choose an OpenAI-compatible AI gateway — a buyer's checklist
A five-dimension checklist for picking an AI API gateway without getting a re-wrapped, token-inflating or flaky reseller: authenticity, billing honesty, reliability, coverage, and price & terms — plus a one-minute due-diligence script to test the claims instead of trusting them.
- $Trust//Jun 4, 2026//7 min read
Is your "Claude" really Claude? Four tests to spot a re-wrapped or downgraded model proxy
A gateway can return a smaller model, a template proxy, a clipped context window or stripped native features behind the flagship's name. Four runnable tests — context, tool calls, vision, caching — to verify you're getting the genuine first-party model, on any gateway including Brievio.
- $Trust//Jun 4, 2026//7 min read
Token inflation — how some AI gateways bill you 5–25×, and a 20-line test to catch it
Some AI API gateways report inflated token counts — a hidden injected system prompt or a fabricated usage object — and you pay 5–25× the real cost. How the padding works, a runnable 20-line test for any gateway (including Brievio), and how to read the result.
- $Review//May 24, 2026//7 min read
Image models shootout — Nano Banana Pro vs Flux 2 Pro vs Seedream V4
Three top 1K image models, 60 prompts, honest verdicts. Best text rendering, best photo realism, best illustration — plus per-image cost on Brievio. Pick the right one for the right use case.
- $Engineering//May 24, 2026//9 min read
Engineering a 99.95% SLO for an AI API gateway — failover, watchdogs, and the boring stuff
How we hit 99.95% monthly uptime across 12 upstreams: weighted candidate routing with real-time weight decay, an aggressive 50ms first-byte watchdog, transactional balance reservations, and the operational scaffolding that matters more than the dispatcher.
- $Guide//May 24, 2026//8 min read
Anthropic prompt caching — cut 90% off your input bill in 30 minutes
The full picture: how cache_control works, OpenAI-style automatic caching, the 4-breakpoint pattern for agent loops, what silently breaks caching, and how to verify your hit rate is non-zero. Includes Brievio cache rates per model.
- $Guide//May 23, 2026//6 min read
Migrating from OpenAI to Brievio in 10 minutes — Python, Node, LangChain, Vercel AI SDK
Four flavors of OpenAI integration, the one-line change each needs to start running through Brievio, and a smoke-test that costs less than a cent.
- $Playbook//May 23, 2026//8 min read
AI API cost optimization — five techniques that actually cut the bill
Prompt caching, model tiering, output caps, parallelism, retry hygiene — with runnable code for each and realistic per-technique savings ranges. Stack them and you cut 70–80%.
- $Guide//May 23, 2026//6 min read
Calling Claude with the OpenAI SDK — change one line, keep your codebase
Anthropic's SDK is great, but the ecosystem standardized on OpenAI's. Here's how to call Claude Opus 4.7, Sonnet 4.6 and Haiku 4.5 with the unmodified OpenAI Python and Node SDKs — streaming, tool use, vision included.
- $Guide//May 23, 2026//7 min read
Veo 3 and Sora API quickstart — text-to-video and image-to-video in five minutes
First Veo 3 and Sora video generation calls through an OpenAI-style API: text-to-video, image-to-video with first/last frame control, file uploads, and a production-ready Python and Node example. No waitlist, no per-provider billing.
- $Announcement//May 23, 2026//4 min read
Launching Brievio — one OpenAI-compatible API for the genuine first-party models, priced just under official list
Why we built a Stripe-native AI API gateway around reliability and honest billing: the real Claude, Gemini, GPT-Image and Veo on enterprise-grade infrastructure, one auditable bill, ~15% under official list, and a $2 free credit to get you started.