In a companion post we tested whether a gateway bills you honest token counts. This one tests the other promise: whether the model behind the name is actually the model. A reseller can return something that calls itself claude-sonnet-4-6 but is a smaller model, a fine-tune, a prompt wrapped in a fixed template, or the real model with its context window and native features quietly clipped. Same string in model, very different thing on the wire.

You don't need anyone's word for it. Four short tests separate the genuine first-party model from a re-wrap. None of them rely on asking the model "what are you" — models are unreliable narrators of their own identity. Probe capabilities instead.

The four ways a model gets faked

The swap. A cheaper or smaller model is served behind the flagship's name. Cheapest to run, hardest to notice on easy prompts.
The template proxy. Your prompt is stuffed into a fixed scaffold before it reaches the model — which changes behavior and pads your token bill with text you never wrote.
The clipped window. It claims 200K context but truncates to a fraction of it, silently dropping the middle of long inputs.
The stripped features. Tool use, vision or prompt caching are dropped or faked, so anything past plain chat degrades.

Test 1 — the context window

Hide a fact deep inside a long document and ask for it back. A genuine 200K-context model retrieves it; a truncated downgrade errors on the input or loses the middle:

test_context_window.py

# test_context_window.py
# A downgraded model behind the name can't actually hold the context it
# claims. Hide a fact deep in a long document and ask for it back.
from openai import OpenAI
client = OpenAI(api_key="sk-brievio-...", base_url="https://api.brievio.com/v1")

needle = "The launch code is HORIZON-7741."
filler = ("This sentence is filler. " * 9000)          # ~50K tokens of noise
haystack = filler + "\n\n" + needle + "\n\n" + filler

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "Answer only from the document."},
        {"role": "user", "content": haystack + "\n\nWhat is the launch code?"},
    ],
    max_tokens=20,
)
print(resp.choices[0].message.content)   # genuine: "HORIZON-7741"
# A truncated/downgraded proxy errors on the long input, or silently drops the
# middle and answers "I don't know" — push the filler past the model's claimed
# window (e.g. 150K+ tokens for a 200K model) and watch what breaks.

Test 2 — native tool calling

Ask for a tool call and inspect tool_calls. The genuine model returns a structured call; a re-wrap that only pretends to support tools returns null and dumps a JSON blob into the text:

test_tools.py

# test_tools.py — native tool calling, or a fake?
resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "What's the weather in Tokyo? Use the tool."}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}},
        },
    }],
    tool_choice="auto",
)
msg = resp.choices[0].message
print("tool_calls:", msg.tool_calls)   # genuine: a structured get_weather(city="Tokyo")
# A re-wrapped proxy that doesn't really support tools will return tool_calls=None
# and instead jam a JSON blob into message.content as plain text. That's the tell.

Test 3 — vision

Send an image whose contents you already know and ask the model to read it. A text-only downgrade can't — it hallucinates or errors:

test_vision.py

# test_vision.py — can it actually see?
resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Reply with only the exact text shown in this image."},
            {"type": "image_url", "image_url": {"url": "https://your-host/known-text.png"}},
        ],
    }],
    max_tokens=30,
)
print(resp.choices[0].message.content)   # genuine: the text in the image
# A text-only downgrade can't read it — it hallucinates, errors on the image
# part, or ignores it. Use an image whose contents you already know.

Test 4 — caching and the bill

The fourth check is the one from the token-inflation post: send a long prefix twice and confirm cached_tokens is non-zero on the repeat, and that your prompt_tokens match the text you actually sent. A template proxy fails both — it can't cache a prefix it rewrites, and it bills you for the wrapper. Authenticity of the model and honesty of the meter travel together; check them together.

Putting it together

A genuine model passes all four: it holds its full context, returns real tool calls, reads images, caches prefixes, and bills the tokens you sent. A re-wrap or downgrade breaks at least one — usually the expensive-to-fake ones (long context, vision) first. Run the suite once when you onboard a gateway and again whenever a model's answers quietly get worse; regressions here are how silent downgrades show up.

The honest baseline

Brievio routes the genuine first-party models over tier-1 cloud channels — Claude via AWS Bedrock, Gemini via Google Vertex — with the full context window, native tool use, vision and prompt caching passed through untouched, and the model you request is the model you get. Run every test above against Brievio and it should pass clean. The model catalog lists each model's real capabilities and context, and the docs show the exact request shapes used here.

"Is it the real model" and "does the meter tell the truth" are the two questions worth asking any AI gateway — including this one. Both take about a minute to answer. Ask them.

Is your "Claude" really Claude? Four tests to spot a re-wrapped or downgraded model proxy

The four ways a model gets faked

Test 1 — the context window

Test 2 — native tool calling

Test 3 — vision

Test 4 — caching and the bill

Putting it together

The honest baseline

$ ls ./related

Too good to be true: where an 80%-under-list AI gateway's capacity comes from

Token inflation — how some AI gateways bill you 5–25×, and a 20-line test to catch it

OpenAI-compatible: what actually has to match (and what breaks)

Tool use with Claude and Gemini through one OpenAI-compatible API