Tool use — also called function calling — is what turns a chat model into something that can do things: look up a record, hit an API, run a calculation, query your database. The model doesn't run the code; it tells you which function to call and with what arguments, you run it, and you hand the result back so it can finish the answer. The good news: the OpenAI tools shape is the de-facto standard, and through Brievio the exact same code drives both Claude and Gemini behind one base_url. Change the model string; leave everything else alone.

This post is the practical version: define a tool, read tool_calls, run the multi-turn loop end to end, and handle parallel calls. Every snippet is runnable against https://api.brievio.com/v1 with the OpenAI Python SDK. I'll flag the few places where behavior genuinely differs between model families so you don't get surprised in production.

The mental model: the loop, not a magic call

Function calling is a conversation, not a one-shot. It always follows the same four beats:

You send the user message plus a list of tools the model is allowed to use.
The model decides. Either it answers in prose, or it returns one or more tool_calls — a function name and a JSON string of arguments — and stops.
You run the function in your own code and append the result back into the message list as a tool message.
You call the model again with the longer history. It reads the result and either answers or asks for another tool. Repeat until there are no more tool calls.

The model never touches your systems. It only ever proposes; your code disposes. That boundary is the whole security story of tool use — treat every argument the model sends as untrusted input and validate it like you would a form field.

Step 1 — define a tool and read the call

A tool is a JSON Schema wrapped in {"type": "function", ...}. The description fields are not decoration — they are the only thing the model reads to decide when and how to call. Write them like you're writing a docstring for a junior engineer:

define_tool.py

# Define a tool with the standard OpenAI "function" schema, then read
# the model's tool_calls back. Identical shape for Claude and Gemini.
from openai import OpenAI
import json

client = OpenAI(
    api_key="sk-brievio-...",
    base_url="https://api.brievio.com/v1",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "e.g. 'Tokyo'"},
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit.",
                    },
                },
                "required": ["city"],
            },
        },
    }
]

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",   # swap for "gemini-2.5-pro" — same code below
    messages=messages,
    tools=tools,
    tool_choice="auto",          # let the model decide whether to call
)

msg = resp.choices[0].message

# The model didn't answer in prose — it asked you to run a function.
if msg.tool_calls:
    for call in msg.tool_calls:
        print(call.function.name)             # "get_weather"
        print(call.function.arguments)         # '{"city": "Tokyo", "unit": "celsius"}'
        args = json.loads(call.function.arguments)  # always a JSON STRING — parse it
else:
    print(msg.content)           # plain answer, no tool needed

Two things bite people here. First, function.arguments is a JSON string, not a dict — you always json.loads it. Second, the model may choose not to call a tool, in which case tool_calls is empty and content holds a normal answer. Branch on both. This is identical whether you set model to claude-sonnet-4-6 or gemini-2.5-pro; Brievio passes the request to the genuine first-party model and returns native tool calls — it doesn't reshape or fake them.

Step 2 — the multi-turn loop

Now wire up the round trip. The shape that matters: append the assistant message exactly as returned (it carries the call ids), then append one tool message per call, each echoing back its tool_call_id. Mismatch an id and the next request 400s. Here's the whole loop, working for both providers from a single function:

tool_loop.py

# The multi-turn loop: model asks -> you run the function ->
# you feed the result back -> model writes the final answer.
def run_get_weather(city: str, unit: str = "celsius") -> dict:
    # Your real implementation: an HTTP call, a DB lookup, whatever.
    return {"city": city, "temp": 18, "unit": unit, "sky": "clear"}

TOOL_IMPLS = {"get_weather": run_get_weather}

def answer(question: str, model: str) -> str:
    messages = [{"role": "user", "content": question}]

    while True:
        resp = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )
        msg = resp.choices[0].message

        # No tool requested -> this is the final answer. Done.
        if not msg.tool_calls:
            return msg.content

        # 1. Append the assistant turn EXACTLY as returned (it carries the
        #    tool_call ids the next messages must reference).
        messages.append(msg)

        # 2. Run each requested function and append one tool message per call,
        #    echoing back the matching tool_call_id.
        for call in msg.tool_calls:
            fn = TOOL_IMPLS[call.function.name]
            args = json.loads(call.function.arguments)
            result = fn(**args)
            messages.append({
                "role": "tool",
                "tool_call_id": call.id,          # MUST match the call's id
                "content": json.dumps(result),    # stringify the result
            })
        # 3. Loop: the model now sees the tool output and continues.

# Same function works for both providers behind the one base_url:
print(answer("What's the weather in Tokyo?", "claude-sonnet-4-6"))
print(answer("What's the weather in Tokyo?", "gemini-2.5-pro"))

That while True is the engine of every agent you've ever used. A model can chain tools — call search, read the result, then call get_details on the top hit, then answer — and the loop handles arbitrary depth without special-casing. Add a turn counter as a guardrail so a confused model can't spin forever; 8–10 rounds is a sane ceiling for most apps.

One honest caveat on portability: the protocol is identical across Claude and Gemini, but the behavior isn't a clone. Different model families pick different tools, phrase arguments differently, and vary in how eagerly they call versus answer from prior knowledge. The code doesn't change; the judgement does. Test your prompts against each model you plan to ship on rather than assuming one transfers perfectly to the other.

Step 3 — parallel tool calls

When a question needs several independent lookups — three cities' weather, five SKUs' stock — a capable model can return all the calls in a single assistant turn. You run them (concurrently, if the work is I/O-bound) and return one tool message per id before asking again:

parallel_calls.py

# Parallel tool calls: one assistant turn can request several functions at
# once. You run them (concurrently if you like) and return one tool message
# per call id. Whether a model batches calls varies — so always iterate
# over the list rather than assuming exactly one.
messages = [{"role": "user",
             "content": "Compare the weather in Tokyo, Paris and Cairo."}]

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)
msg = resp.choices[0].message
messages.append(msg)

# msg.tool_calls may now hold THREE get_weather calls with distinct ids.
from concurrent.futures import ThreadPoolExecutor

def handle(call):
    args = json.loads(call.function.arguments)
    result = TOOL_IMPLS[call.function.name](**args)
    return {
        "role": "tool",
        "tool_call_id": call.id,
        "content": json.dumps(result),
    }

with ThreadPoolExecutor() as pool:
    tool_msgs = list(pool.map(handle, msg.tool_calls or []))

messages.extend(tool_msgs)   # append ALL results before the next request

final = client.chat.completions.create(
    model="claude-sonnet-4-6", messages=messages, tools=tools,
)
print(final.choices[0].message.content)

# Note: if a model returns calls one-at-a-time instead of batched, the loop
# from the previous snippet handles that for free — it just runs more rounds.

Here is where model families differ most, so don't hard-code an assumption. Whether a given model emits parallel calls in one turn, or walks them one at a time across several turns, varies by family and sometimes by request. The fix is simple and already in the code above: iterate over tool_calls and let the loop run more rounds if needed. Code that loops over the returned list is correct in both cases; code that assumes exactly one call is the bug. Likewise, strict-schema enforcement (guaranteed-valid JSON, rejected extra keys) isn't uniform — keep validating arguments server-side regardless of which model produced them.

Why one base_url is the actual win

Without a gateway, supporting Claude and Gemini means two SDKs, two auth schemes, two payload shapes, and two sets of tool-result plumbing — Anthropic's tool_use/tool_result content blocks on one side, Google's function-call parts on the other. Behind Brievio's OpenAI-compatible endpoint, both speak the Chat Completions tools dialect you saw above, so an A/B test between models is a one-line diff and your tool layer is written once. The full request/response contract — including the tool fields — is in the Chat Completions docs, and the live model list with exact ids is on the models page.

It's worth saying plainly: the value only holds if the model on the other end is the real one. Tool calling is actually a useful authenticity signal — a genuine flagship reliably produces well-formed tool_calls with sensible arguments on non-trivial schemas, where a cheaper stand-in tends to fumble the JSON or ignore the tool. Brievio serves the genuine first-party models (Claude Sonnet 4.6, Opus 4.7, Gemini 2.5 Pro/Flash and others), honors native tool calling, and reports honest token counts; if you want to confirm that for yourself, see how to check your Claude is really Claude.

A short field checklist

Parse arguments, always. function.arguments is a string; json.loads it and validate before use.
Echo the ids. Append the assistant message verbatim, then one tool message per call with the matching tool_call_id. All of them before the next request.
Loop over the list. Never assume one call per turn — handle zero, one, and many. That single habit makes parallel and sequential models both Just Work.
Cap the rounds. A turn counter prevents an infinite tool-calling spiral and bounds your cost.
Trust nothing. Arguments are model output. Validate types, ranges and permissions exactly as you would user input.

Get those five right and you have a tool-using agent that runs unchanged across Claude and Gemini, with the option to route by cost or capability per request. Note that failed 4xx/5xx calls on Brievio aren't billed, so the inevitable schema-tuning iterations while you get tool definitions right are free. When you're ready to pick which models to put behind your tools, the gateway-selection guide walks through the tradeoffs that actually matter in production.

Tool use with Claude and Gemini through one OpenAI-compatible API

The mental model: the loop, not a magic call

Step 1 — define a tool and read the call

Step 2 — the multi-turn loop

Step 3 — parallel tool calls

Why one base_url is the actual win

A short field checklist

$ ls ./related

Vision and document understanding with Claude and Gemini via one API

Structured output and JSON mode across Claude and Gemini

Embeddings and semantic search with the OpenAI SDK (RAG guide)

Rate limits, retries and backoff: production error handling for AI APIs