Opus 4.7 is smarter, more literal, and quietly more expensive. Those are three different problems.

Opus 4.7: Smarter, More Literal, and Quietly More Expensive

Main Thesis

Anthropic’s Opus 4.7 release represents three distinct engineering decisions that shipped simultaneously, not one unified upgrade. Treating them as a single story leads to bad migration decisions — either overpaying for work that got cheaper, or downgrading away from a model that genuinely excels at hard tasks.

Key Findings

✅ Real Capability Gains (But Uneven)

Measurably stronger on hard tasks: persistence, coding, vision, and knowledge work
These gains are not uniform — some areas regressed
Web research and terminal use show notable regressions worth routing around
The knowledge-work improvements are being buried under backlash noise

🔤 The Literalness Problem

Opus 4.7 does exactly what you ask — no more, no less
Previous models would infer intent and fill gaps automatically
Workflows built around the old model’s ‘guessing’ behaviour will break silently
The model is also described as more combative
Fix: Write clearer, more explicit prompts — not longer ones

💸 The Hidden Cost Increase

Sticker price didn’t change, but effective cost per unit of output increased
Three compounding factors:
1. Tokenizer tax — new tokenizer uses more tokens for equivalent content
2. Adaptive thinking — model spends more tokens reasoning before responding
3. Breaking API changes — silent changes that compound billing unexpectedly
Author provides a cost estimator prompt to quantify the tokenizer tax on specific usage patterns

🎨 Claude Design & The $42 Afternoon

Author tested Claude Design, a tool that converts brand identity into machine-readable agent instructions
Real-world session cost $42 in an afternoon
The correction loop required reveals a gap between Anthropic’s current capability and where its valuation implies it should be

Practical Takeaways

Run a pre-flight check — use the provided prompt to flag what breaks before migrating
Quantify your tokenizer tax — don’t assume your bill stays flat; measure it against your actual usage
Build a peer review workflow — add a reliability layer now, not after production breaks
Don’t make one migration decision — evaluate capability gains, literalness changes, and cost impacts separately
Route around regressions — web research and terminal tasks may be better served by other models post-upgrade

Bottom Line

Opus 4.7 is the right model for genuinely hard work. It’s the wrong default if your prompts were lazy, your workflows assumed inference, or your budget was calculated on the old tokenizer. Fix the prompts, measure the cost, and migrate selectively.

Infographic

Infographic wide