Opus 4.7 is smarter, more literal, and quietly more expensive. Those are three different problems.

· 2 min read · Alex

Opus 4.7 is smarter, more literal, and quietly more expensive. Those are three different problems.

Read the original article

Opus 4.7: Smarter, More Literal, and Quietly More Expensive

Main Thesis

Anthropic’s Opus 4.7 release represents three distinct engineering decisions that shipped simultaneously, not one unified upgrade. Treating them as a single story leads to bad migration decisions — either overpaying for work that got cheaper, or downgrading away from a model that genuinely excels at hard tasks.


Key Findings

✅ Real Capability Gains (But Uneven)

  • Measurably stronger on hard tasks: persistence, coding, vision, and knowledge work
  • These gains are not uniform — some areas regressed
  • Web research and terminal use show notable regressions worth routing around
  • The knowledge-work improvements are being buried under backlash noise

🔤 The Literalness Problem

  • Opus 4.7 does exactly what you ask — no more, no less
  • Previous models would infer intent and fill gaps automatically
  • Workflows built around the old model’s ‘guessing’ behaviour will break silently
  • The model is also described as more combative
  • Fix: Write clearer, more explicit prompts — not longer ones

💸 The Hidden Cost Increase

  • Sticker price didn’t change, but effective cost per unit of output increased
  • Three compounding factors:
    1. Tokenizer tax — new tokenizer uses more tokens for equivalent content
    2. Adaptive thinking — model spends more tokens reasoning before responding
    3. Breaking API changes — silent changes that compound billing unexpectedly
  • Author provides a cost estimator prompt to quantify the tokenizer tax on specific usage patterns

🎨 Claude Design & The $42 Afternoon

  • Author tested Claude Design, a tool that converts brand identity into machine-readable agent instructions
  • Real-world session cost $42 in an afternoon
  • The correction loop required reveals a gap between Anthropic’s current capability and where its valuation implies it should be

Practical Takeaways

  1. Run a pre-flight check — use the provided prompt to flag what breaks before migrating
  2. Quantify your tokenizer tax — don’t assume your bill stays flat; measure it against your actual usage
  3. Build a peer review workflow — add a reliability layer now, not after production breaks
  4. Don’t make one migration decision — evaluate capability gains, literalness changes, and cost impacts separately
  5. Route around regressions — web research and terminal tasks may be better served by other models post-upgrade

Bottom Line

Opus 4.7 is the right model for genuinely hard work. It’s the wrong default if your prompts were lazy, your workflows assumed inference, or your budget was calculated on the old tokenizer. Fix the prompts, measure the cost, and migrate selectively.

Infographic

Infographic wide