Engineering notes and product updates
Notes from the team building Brievio's AI API gateway — the genuine models, reliability engineering, honest billing, and SDK plumbing.
- $Guide//Jun 4, 2026//8 min read
How to choose an OpenAI-compatible AI gateway — a buyer's checklist
A five-dimension checklist for picking an AI API gateway without getting a re-wrapped, token-inflating or flaky reseller: authenticity, billing honesty, reliability, coverage, and price & terms — plus a one-minute due-diligence script to test the claims instead of trusting them.
- $Trust//Jun 4, 2026//7 min read
Is your "Claude" really Claude? Four tests to spot a re-wrapped or downgraded model proxy
A gateway can return a smaller model, a template proxy, a clipped context window or stripped native features behind the flagship's name. Four runnable tests — context, tool calls, vision, caching — to verify you're getting the genuine first-party model, on any gateway including Brievio.
- $Trust//Jun 4, 2026//7 min read
Token inflation — how some AI gateways bill you 5–25×, and a 20-line test to catch it
Some AI API gateways report inflated token counts — a hidden injected system prompt or a fabricated usage object — and you pay 5–25× the real cost. How the padding works, a runnable 20-line test for any gateway (including Brievio), and how to read the result.
- $Review//May 24, 2026//7 min read
Image models shootout — Nano Banana Pro vs Flux 2 Pro vs Seedream V4
Three top 1K image models, 60 prompts, honest verdicts. Best text rendering, best photo realism, best illustration — plus per-image cost on Brievio. Pick the right one for the right use case.
- $Engineering//May 24, 2026//9 min read
Engineering a 99.95% SLO for an AI API gateway — failover, watchdogs, and the boring stuff
How we hit 99.95% monthly uptime across 12 upstreams: weighted candidate routing with real-time weight decay, an aggressive 50ms first-byte watchdog, transactional balance reservations, and the operational scaffolding that matters more than the dispatcher.
- $Guide//May 24, 2026//8 min read
Anthropic prompt caching — cut 90% off your input bill in 30 minutes
The full picture: how cache_control works, OpenAI-style automatic caching, the 4-breakpoint pattern for agent loops, what silently breaks caching, and how to verify your hit rate is non-zero. Includes Brievio cache rates per model.
- $Guide//May 23, 2026//6 min read
Migrating from OpenAI to Brievio in 10 minutes — Python, Node, LangChain, Vercel AI SDK
Four flavors of OpenAI integration, the one-line change each needs to start running through Brievio, and a smoke-test that costs less than a cent.
- $Playbook//May 23, 2026//8 min read
AI API cost optimization — five techniques that actually cut the bill
Prompt caching, model tiering, output caps, parallelism, retry hygiene — with runnable code for each and realistic per-technique savings ranges. Stack them and you cut 70–80%.
- $Guide//May 23, 2026//6 min read
Calling Claude with the OpenAI SDK — change one line, keep your codebase
Anthropic's SDK is great, but the ecosystem standardized on OpenAI's. Here's how to call Claude Opus 4.7, Sonnet 4.6 and Haiku 4.5 with the unmodified OpenAI Python and Node SDKs — streaming, tool use, vision included.
- $Guide//May 23, 2026//7 min read
Veo 3 and Sora API quickstart — text-to-video and image-to-video in five minutes
First Veo 3 and Sora video generation calls through an OpenAI-style API: text-to-video, image-to-video with first/last frame control, file uploads, and a production-ready Python and Node example. No waitlist, no per-provider billing.
- $Announcement//May 23, 2026//4 min read
Launching Brievio — one OpenAI-compatible API for the genuine first-party models, priced just under official list
Why we built a Stripe-native AI API gateway around reliability and honest billing: the real Claude, Gemini, GPT-Image and Veo on enterprise-grade infrastructure, one auditable bill, ~15% under official list, and a $2 free credit to get you started.