Chat models want to talk. Your pipeline wants a record. The gap between "here's a friendly paragraph about the ticket" and {"category": "billing", "priority": "high"} is where most LLM integrations quietly break — a stray markdown fence, a trailing comma, a hallucinated key, and the json.loads downstream throws at 3 a.m. This post is about forcing valid, useful JSON out of Claude and Gemini, using the same OpenAI request shape behind one base_url, and the validation layer that makes it production-grade rather than demo-grade.
There are three tools for the job: response_format with json_object, response_format with json_schema, and native tool/function calling. They are not interchangeable, and picking the wrong one is the most common reason a structured-output feature feels flaky. We'll walk through each, when to reach for it, how to design the schema, and how to validate and repair what comes back.
JSON mode: guaranteed parseable, not guaranteed correct
The simplest lever is response_format={"type": "json_object"}. It constrains the model to emit syntactically valid JSON — no prose preamble, no ```json fence, no apology. What it does not do is enforce your shape. You still have to describe the fields in the prompt, and the model can still omit a key, invent one, or put a string where you wanted a boolean.
# response_format=json_object: the model is constrained to emit syntactically
# valid JSON. It does NOT enforce YOUR shape — you still have to describe the
# fields in the prompt. Identical call for Claude and Gemini behind one base_url.
from openai import OpenAI
import json
client = OpenAI(
api_key="sk-brievio-...",
base_url="https://api.brievio.com/v1",
)
resp = client.chat.completions.create(
model="claude-sonnet-4-6", # or "gemini-2.5-flash" — same code
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": (
"Extract the support ticket fields. Reply ONLY with a JSON object "
"with keys: category (one of billing|bug|feature|other), "
"priority (low|medium|high), summary (string), "
"needs_human (boolean)."
),
},
{"role": "user", "content": "I was charged twice this month, please refund."},
],
)
data = json.loads(resp.choices[0].message.content) # guaranteed parseable
print(data["category"], data["priority"]) # "billing" "high"
# json_object guarantees: it parses. It does NOT guarantee your keys exist,
# the enums are valid, or types are right. That's what validation is for.This is the right tool when the shape is simple, when you control the prompt tightly, or when you're going to validate anyway (you are). The mental model: json_object buys you a guarantee that json.loads won't throw. It does not buy you a guarantee that the object means what you think. Treat the difference as the whole ballgame.
JSON Schema mode: constrain the shape, not just the syntax
When you want the field names, types, and enums enforced — not just requested — reach for json_schema. The schema travels with the request, and with strict: true (where the model family supports it) the output is constrained to match. The two fields that make "strict" actually mean something are additionalProperties: false (no surprise keys) and a complete required array (no missing keys).
# response_format=json_schema: the schema is sent to the model and the output
# is constrained to it. Set strict=True for the hard guarantee where supported.
# additionalProperties=False + required keys is what makes "strict" meaningful.
schema = {
"name": "support_ticket",
"strict": True,
"schema": {
"type": "object",
"additionalProperties": False,
"properties": {
"category": {"type": "string", "enum": ["billing", "bug", "feature", "other"]},
"priority": {"type": "string", "enum": ["low", "medium", "high"]},
"summary": {"type": "string"},
"needs_human": {"type": "boolean"},
},
"required": ["category", "priority", "summary", "needs_human"],
},
}
resp = client.chat.completions.create(
model="claude-sonnet-4-6",
response_format={"type": "json_schema", "json_schema": schema},
messages=[
{"role": "system", "content": "Extract the support ticket fields."},
{"role": "user", "content": "I was charged twice this month, please refund."},
],
)
ticket = json.loads(resp.choices[0].message.content)
# With strict json_schema, category is provably one of the four enums —
# no defensive "if category not in (...)" needed on the happy path.Here's the honest caveat, and it matters: strict json_schema support varies by model family. Some models honor every constraint including nested additionalProperties: false; others treat the schema as a strong hint rather than a hard grammar, especially on deeply nested objects, unions (anyOf), or recursive structures. Brievio passes your response_format straight through to the genuine first-party model, so what you get is the real model's real behavior — not a watered-down emulation. But that also means the model's native limits are your limits. The practical rule: request the schema, then validate anyway. Never let "strict" talk you out of the validation step.
JSON mode vs. tool calling: when to use which
Tool/function calling also returns structured JSON — the arguments come back as a JSON string keyed to a function name. So which do you use? The distinction is about intent, not formatting:
- Use JSON mode when the JSON is the answer. You're extracting fields, classifying, summarizing into a record, or generating a config object. There is exactly one shape you want back, every time.
response_formatis the cleaner fit — one output, no function-call ceremony, notool_choiceplumbing. - Use tool calling when the model is choosing an action. It might call
get_weather, orsearch_db, or answer in prose — and you want the model to decide which, possibly calling several. Function calling is built for dispatch: many candidate shapes, the model picks. Forcing that through a single JSON object is awkward. - The gray area: single forced tool call as structured output. Setting
tool_choiceto require one specific function is a time-honored way to get structured output on models that predatejson_schema. It still works and is a fine fallback. But if a model supportsjson_schema, that path is more direct and less to reason about.
If your workload is genuinely about actions and dispatch rather than a fixed record, the mechanics and the cross-model gotchas live in tool use across Claude and Gemini. For everything that's "give me this object," stay with response_format.
Designing a schema the model can actually hit
A schema is a prompt. The way you shape it changes the hit rate as much as the model choice does. A few rules that pay off across both families:
- Prefer flat over deeply nested. Three levels of nesting with optional objects is where strict mode wobbles. If you can flatten
address.citytocity, do it, then reshape after validation. - Use enums for any closed set.
"priority": {"enum": ["low","medium","high"]}is far more reliable than a freestringyou post-process. Enums are the single highest-leverage schema feature. - Name fields the way a human would.
needs_human_reviewbeatsnh_flag. The model fills well-named fields more accurately because the name carries the instruction. - Put a
descriptionon ambiguous fields. One line per field inside the schema resolves most "the model guessed wrong" cases without a prompt rewrite. - Make optionality explicit. If a field can be absent, either leave it out of
requiredor model it as a nullable union — don't expect the model to invent a sentinel. Decide who owns the "missing" case, you or the model. - Avoid free-form numbers when a bounded type will do. A 1–5 integer rating as an enum of
[1,2,3,4,5]outperforms "a number from 1 to 5" in the prompt.
Validate and repair: the layer that ships
The single biggest reliability upgrade is treating model output like an untrusted client request: parse it, validate it against your real schema, and on failure retry once with the error fed back. A Pydantic model (or zod, or JSON Schema validation in your language) catches the cases that slip through even strict mode — and the repair turn fixes most of them, because the model is good at correcting a mistake you point at directly.
# Never trust output you didn't validate. Treat the model like an untrusted
# client: parse -> validate against your schema -> retry once with the error
# fed back. This is the layer that turns "usually works" into "ships".
from pydantic import BaseModel, ValidationError
from typing import Literal
class Ticket(BaseModel):
category: Literal["billing", "bug", "feature", "other"]
priority: Literal["low", "medium", "high"]
summary: str
needs_human: bool
def extract(text: str, model: str, retries: int = 1) -> Ticket:
messages = [
{"role": "system", "content": "Extract the support ticket fields as JSON."},
{"role": "user", "content": text},
]
for attempt in range(retries + 1):
resp = client.chat.completions.create(
model=model,
response_format={"type": "json_object"},
messages=messages,
)
raw = resp.choices[0].message.content
try:
return Ticket.model_validate_json(raw) # parse + validate in one step
except ValidationError as e:
if attempt == retries:
raise
# Repair turn: show the model exactly what was wrong.
messages += [
{"role": "assistant", "content": raw},
{"role": "user", "content": f"That failed validation: {e}. Re-emit valid JSON only."},
]
raise RuntimeError("unreachable")Notice what the repair turn does: it shows the model its own bad output and the exact validation error, then asks for a re-emit. One retry resolves the overwhelming majority of failures; if it still fails, you want to know, so let it raise. Don't loop forever burning tokens — bound the retries, log the raw payload, and alert on the hard failures. A persistent validation failure usually means the schema is asking for something the input can't support, not that the model is broken.
Two production notes. First, set a generous max_tokens: JSON that gets truncated mid-object is invalid JSON, and a too-tight token cap is a leading cause of parse failures on large records. Second, keep temperature low (0 to 0.3) for extraction and classification — you want the same input to yield the same record, and creativity is not a virtue when you're filling a struct.
One shape, both model families
Every snippet above runs against Claude Sonnet 4.6 and Gemini 2.5 Flash by changing one string — the model field. That's the point of routing structured output through Brievio: the OpenAI-shaped response_format contract is identical, so you can A/B a cheaper model on an extraction task, or fall back across families during an incident, without rewriting your parsing or validation. The request you send is the request the genuine first-party model receives — here's exactly what matches and what to watch for when you depend on OpenAI compatibility.
A practical workflow: prototype with strict json_schema on Sonnet, confirm your validator passes on a held-out set, then try the same schema on Flash. If the cheaper model clears your validation rate, you've cut cost with zero code change — and because Brievio reports honest token counts and bills failed 4xx/5xx calls at zero, your retries and repair turns don't hide a metering surprise. Compare the models on the models page, and the full request/response contract for chat lives in the chat docs.
The takeaway
Reach for json_object when the shape is simple and you own the prompt; reach for json_schema with strict: true, additionalProperties: false, and a full required array when you want the structure enforced; reach for tool calling when the model is choosing an action rather than producing one fixed record. Whichever you pick, design the schema flat and enum-heavy, then always parse-validate-repair — because strict support varies by model family, and the validation layer is the difference between a structured-output feature that demos and one that survives real traffic. The same code, the same contract, the genuine model — across Claude and Gemini, behind one base URL.