cd ../返回博客
$Review//2026年6月4日//7 min read

Claude Opus 4.7 vs Sonnet 4.6 vs Haiku 4.5: which to use when

Compare Claude Opus 4.7, Sonnet 4.6 and Haiku 4.5 on capability, speed and real cost. Concrete use-when guidance plus a tiering pattern to cut spend.

Claude isn't one model — it's a tier list. Opus is the deepest reasoner, Sonnet is the balanced workhorse, Haiku is the fast cheap one. The most common mistake teams make is picking the top of the range for everything "to be safe," then watching the bill climb for work a smaller model would have nailed. The opposite mistake — forcing every hard job through Haiku to save money — quietly costs you in retries, wrong answers, and human cleanup. The right answer is almost never "one model." It's match the tier to the task.

This post lays out what each Claude tier is actually good at, what the three cost on Brievio, concrete "use X when…" guidance, and a tiering pattern that routes easy work to Haiku and escalates only the hardest jobs to Opus. Every tier on Brievio is the genuine first-party model over AWS Bedrock — full 200K context, native tools, vision, and caching — priced roughly 15% under Anthropic's official list.

The three tiers at a glance

Here is the whole tradeoff in one place — Brievio rate (with Anthropic's official list for reference), per 1M tokens, input / output:

  • Claude Opus 4.7 — $4.25 / $21.25 (official $5 / $25). The deepest reasoning and the strongest agentic behavior: long multi-step plans, gnarly refactors, ambiguous specs, research-grade analysis. The most capable and the most expensive — by design, the one you reach for last.
  • Claude Sonnet 4.6 — $2.55 / $12.75 (official $3 / $15). The balanced production workhorse and an elite coder. For most teams this is the default: strong enough for the large majority of real work, fast enough to feel responsive, priced so you don't flinch at volume.
  • Claude Haiku 4.5 — $0.85 / $4.25 (official $1 / $5). Fast and cheap, built for high-volume jobs: classification, extraction, routing, tagging, short transforms. Five times cheaper than Opus on input — and on narrow tasks, just as correct.

Note the spread. Opus input is 5× Haiku input; Opus output is 5× Haiku output. On a pipeline that runs millions of calls, that multiplier is the difference between a rounding error and a line item your finance team asks about. The skill isn't picking the "best" model — it's knowing which jobs genuinely need the top tier and which don't.

Use Haiku when…

Haiku is the right call whenever the task is narrow, the output is short, and you're running a lot of them. The decision per call is small; the volume is what matters.

  • Classification and routing — labeling tickets, tagging content, intent detection, spam filtering, sentiment. The answer is one of a handful of options; Haiku gets it right and costs cents per thousand.
  • Structured extraction — pulling fields out of invoices, emails, or logs into JSON against a fixed schema. Pair it with caching for the schema and the per-call cost rounds to nothing.
  • Short transforms at scale — summarizing one paragraph, rewriting a line, normalizing a value, generating a slug. High frequency, low stakes per call.
  • The cheap first pass in a tiered pipeline — triage that decides whether a bigger model even needs to run (more on this below).

Where Haiku struggles: multi-step reasoning, subtle judgment calls, long-horizon planning, and anything where being subtly wrong is expensive. If you find yourself adding retry logic and validators around Haiku output, that's the signal to move that job up a tier.

Use Sonnet when… (the default for most teams)

Sonnet is where most production traffic should live. It's an elite coding model, it follows complex instructions reliably, and it's priced so you can run it as your everyday default without rationing. When you're not sure which tier to pick, start here — then tier down to Haiku for the volume work and up to Opus for the few jobs that truly need it.

  • Day-to-day coding — writing features, fixing bugs, generating tests, code review. Sonnet 4.6 is genuinely strong here and rarely the bottleneck.
  • Customer-facing assistants and RAG chatbots — good judgment, coherent long answers, reliable tool use, fast enough for interactive latency.
  • Content and document workflows — drafting, summarizing long documents, transforming structured content where quality matters but you don't need Opus-grade reasoning.
  • Most agent loops — Sonnet handles multi-tool agents well. Reserve Opus for the planning-heavy or highly ambiguous ones.

The honest framing: a large share of teams could run Sonnet for almost everything and be fine. The reason to tier at all is that the extremes — millions of trivial calls, or a handful of brutally hard ones — are where matching the model to the task pays off most.

Use Opus when…

Opus is the top tier for a reason, but it's the one to reach for deliberately, not by default. Use it when the difficulty genuinely justifies the cost — when a wrong or shallow answer is more expensive than the extra tokens.

  • Hard, long-horizon agentic work — multi-step plans that have to hold together over many tool calls, where Sonnet starts to drift or lose the thread.
  • Gnarly refactors and architecture — large cross-file changes, tricky migrations, debugging a problem that spans several systems.
  • Ambiguous specs and deep analysis — research-grade synthesis, nuanced judgment, problems where you'd hand it to your most senior engineer.
  • The escalation target — the model your pipeline falls back to when a cheaper tier flags a case as hard.

If Opus and Sonnet produce indistinguishable answers on your task, that task didn't need Opus — and you just paid roughly 1.7× the Sonnet rate for nothing. The way to know is to actually compare them on your own prompts, not to assume the expensive one is always better.

The pattern: tier down by default, escalate on demand

The highest-leverage move is to stop thinking in one model and start thinking in a ladder. Do the cheap thing first; escalate only when the cheap thing isn't enough. Because every Brievio tier shares the same base_url and the same SDK, switching tiers is a one-line change — only the model string moves.

tiering.py
# A model-tiering pattern: do the cheap thing first, escalate only when needed.
# Same base_url, same SDK — only the model string changes per tier.
from openai import OpenAI

client = OpenAI(
    api_key="sk-brievio-...",
    base_url="https://api.brievio.com/v1",
)

# Brievio rates per 1M tokens (input / output):
#   Haiku 4.5   $0.85 / $4.25    — fast, cheap, high-volume
#   Sonnet 4.6  $2.55 / $12.75   — balanced production workhorse
#   Opus 4.7    $4.25 / $21.25   — deepest reasoning, hardest jobs

def triage(ticket: str) -> str:
    """Haiku decides: can a cheap model handle this, or escalate?"""
    resp = client.chat.completions.create(
        model="claude-haiku-4-5",
        max_tokens=20,
        messages=[
            {"role": "system", "content": "Reply only EASY or HARD."},
            {"role": "user", "content": ticket},
        ],
    )
    return resp.choices[0].message.content.strip()

def answer(ticket: str) -> str:
    tier = "claude-sonnet-4-6" if triage(ticket) == "EASY" else "claude-opus-4-7"
    resp = client.chat.completions.create(
        model=tier,
        max_tokens=800,
        messages=[{"role": "user", "content": ticket}],
    )
    return resp.choices[0].message.content

# Most tickets resolve on Haiku + Sonnet. Opus only fires on the genuinely hard
# minority — so the average cost per ticket lands far below an all-Opus pipeline.

The economics are simple: a triage call on Haiku costs a fraction of a cent. If it routes the easy majority to Sonnet and only the hard minority to Opus, your average cost per task lands far below an all-Opus pipeline — with no quality loss on the cases that actually needed the top tier. The same logic applies in reverse for pure high-volume work, where Haiku does the whole job:

classify.py
# Where Haiku earns its keep: high-volume classification / extraction.
# At $0.85/1M input, a million short docs cost cents, not dollars.
import json

LABELS = ["bug", "feature_request", "billing", "spam", "other"]

def classify(text: str) -> str:
    resp = client.chat.completions.create(
        model="claude-haiku-4-5",
        max_tokens=10,
        messages=[
            {"role": "system",
             "content": f"Classify into exactly one of: {LABELS}. Reply with the label only."},
            {"role": "user", "content": text},
        ],
    )
    return resp.choices[0].message.content.strip()

# 1,000,000 inbound messages, ~300 input tokens each, ~3 output tokens each:
#   input:  300M tokens × $0.85 / 1M  = $255
#   output:   3M tokens × $4.25 / 1M  = ~$13
# The same job on Opus would cost ~5× the input and ~5× the output for no
# accuracy gain on a task this narrow. Match the tier to the difficulty.

Two patterns, one idea: match the tier to the difficulty. Volume and easy work go to Haiku, the bulk of production goes to Sonnet, and Opus is reserved for the jobs that earn it. Because failed 4xx/5xx calls are free on Brievio, an escalation retry that errors costs you nothing — the meter only moves on a real completion.

Quick pick by task

When you just need an answer, start here and adjust after measuring on your own prompts:

  • Classify / tag / route / extract, at volume → Haiku 4.5.
  • Everyday coding, bug fixes, tests, code review → Sonnet 4.6.
  • Customer-facing chatbot / RAG assistant → Sonnet 4.6.
  • Drafting, summarizing, content workflows → Sonnet 4.6.
  • Most multi-tool agents → Sonnet 4.6; escalate the planning-heavy steps to Opus.
  • Hard refactors, ambiguous specs, deep analysis → Opus 4.7.
  • Not sure? → Sonnet 4.6, then tier down to Haiku for volume and up to Opus for the hardest jobs.

None of this requires committing to one tier up front. Try the same prompt across all three on Brievio, compare the answers and the token counts, and let the results pick the tier. The full rate card is on the pricing page; for the broader strategy of squeezing cost without losing quality, see the cost-optimization playbook and our guide to choosing an AI API gateway. Tiering well is the single biggest lever you have — and it costs nothing but a model string.