You see the ad: "Claude API — 80% off official." It's tempting, and on an easy test prompt it even works. But the model's compute costs what it costs. Anthropic, OpenAI and Google don't hand their flagship inference to resellers at a fifth of list. So when a gateway is 80% under official, the honest question isn't "how are they so efficient" — it's where does the capacity come from. There are only a few answers, and none of them are good for a production workload.
The math that doesn't add up
An official per-token price is roughly compute plus the provider's margin. A reseller buys at — or slightly under — that list, on a volume agreement. Selling the result at 80% under list means selling well below what they pay. Nobody does that at scale, for long, without a catch that you end up paying for somewhere else. A modest discount is a margin on volume infrastructure. A massive one is a tell.
Answer 1 — it isn't the real model
The cheapest way to be 80% cheaper is to not serve the expensive model. A smaller model, a fine-tune, or your prompt wrapped in a template gets returned behind the flagship's name. It passes easy prompts and falls apart on the hard ones. Four tests tell you in a minute whether the model is genuine.
Answer 2 — gray-market capacity
Sometimes the model is real, but the supply isn't legitimate: trial-credit farming, leaked or shared keys, region-arbitrage accounts. It's genuinely cheap right up until the provider notices and shuts it down — and then your production traffic 401s overnight with no warning and no recourse. Cheap capacity that can vanish is not capacity you can build a business on.
Answer 3 — the meter makes it back
A headline 80% discount on the rate means nothing if you're billed for 5× the tokens. A hidden injected system prompt or a padded usage object quietly claws the "discount" back and then some. Test the token counts — the real price is rate × tokens, and the second number is the one that's easy to fake.
Answer 4 — loss-leader lock-in
Some gateways genuinely subsidize the first few months to acquire you, then the price drifts up, the bonus credits expire, the free tier shrinks — and by then your integration, your keys and your billing live there. The sticker was the cheapest part.
What a discount you can trust looks like
A sustainable discount is small and explainable: a margin on volume infrastructure, not a subsidy or a corner cut. Brievio prices each model about 15% under its official list (image and video run deeper), published per model against the official reference rate so you can audit it; the capacity is the genuine first-party model over tier-1 cloud channels — Claude via AWS Bedrock, Gemini via Google Vertex — traceable, not gray-market. It's the discount that's boring on purpose, because boring is what survives in production. See the pricing and the comparisons.
If a gateway is 80% under list, you don't need to assume the worst — you just need to ask where the capacity comes from, and run the authenticity and token tests before you put real traffic on it. The good answers survive the questions.