π€ AI Agents Weekly: Claude Code Review, AutoHarness, Perplexity Personal Computer, Cloudflare /crawl, Context7 CLIβ¦
Original article: https://nlp.elvissaravia.com/p/ai-agents-weekly-claude-code-review
Processed: March 15, 2026 | Source: Elvis Saravia β AI Newsletter
β οΈ Note: The full article is paywalled. This summary is based on the free preview and table of contents.
Summary
This weekβs AI Agents Weekly covers a broad range of agent tooling, infrastructure, and research developments across the industry.
In this issue:
- Claude ships multi-agent Code Review
- AutoHarness makes small agents beat large ones
- Perplexity launches an always-on Personal Computer
- Cloudflare ships a one-call /crawl endpoint
- Context7 CLI brings docs to any agent
- Andrew Ng launches Context Hub
- Cursor Marketplace adds 30+ plugins
- OpenAI shares Skills for Agents SDK
- Google launches Gemini Embedding 2
- Meta ships four MTIA chips in two years
- Codex agent files taxes, catches $20K error
Top Stories
Claude Code Review
Anthropic launched Code Review for Claude Code β an automated system that dispatches multiple AI agents to examine every pull request. Instead of a single pass, parallel agents identify potential issues, verify findings to eliminate false positives, and rank bugs by severity, delivering a consolidated overview comment plus targeted inline annotations.
Key details:
- Multi-agent architecture: The system operates in parallel β agents scan, verify, and prioritize issues independently, producing both a summary comment and inline code annotations for specific problems.
- Scales with complexity: Review depth adjusts based on PR size. Large PRs (over 1,000 lines) received findings 84% of the time, averaging 7.5 issues per PR. Small PRs (under 50 lines) had findings 31% of the time.
- High precision: Less than 1% of flagged issues were marked incorrect by Anthropic engineers, with the system catching production-critical bugs that appeared routine in diffs.
- Pricing and access: Available now as a research preview for Team and Enterprise customers. Reviews average $15β25 per PR, billed on token usage, with configurable monthly caps and per-repo controls.
π Blog
AutoHarness: Automated Agent Constraint Synthesis
Researchers introduced AutoHarness, a technique that lets LLMs automatically synthesize protective code harnesses around themselves, preventing illegal actions without human-written constraints. Instead of relying on larger, more expensive models, the approach uses iterative code refinement with environmental feedback to generate custom safeguards β making smaller models outperform bigger unconstrained ones.
Key details:
- Massive illegal action problem: In a recent LLM chess competition, 78% of Gemini-2.5-Flash losses were attributed to illegal moves. AutoHarness eliminates this class of failure entirely by generating harnesses that enforce valid actions across 145 different TextArena games.
- Small beats large: Gemini-2.5-Flash with a synthesized harness exceeded Gemini-2.5-Proβs performance while reducing costs β demonstrating that proper constraints are more valuable than raw model scale for agent environments.
- Zero-shot generalization: The technique extends beyond game-playing to generating full policies in code, eliminating runtime LLM decision-making entirely and achieving higher rewards than GPT-5.2-High on certain benchmarks.
- Practical agent pattern: The core insight applies broadly to any agent deployment β rather than trusting a model to self-constrain, auto-generate a verified harness that makes illegal states unreachable, shifting safety from model behavior to environment design.
π Paper
More This Week (Preview β Paywalled)
- Perplexity Personal Computer β always-on AI computer product
- Cloudflare /crawl β one-call endpoint for web crawling
- Context7 CLI β brings documentation to any agent
- Andrew Ng Context Hub β new platform from the AI pioneer
- Cursor Marketplace β 30+ new plugins for the AI editor
- OpenAI Skills for Agents SDK β new SDK capabilities
- Google Gemini Embedding 2 β next-gen embedding model
- Meta MTIA chips β four chips shipped in two years
- Codex filing taxes β agent catches a $20K error
Key Takeaways
- Multi-agent verification is production-ready β Claude Code Review demonstrates that parallel agent pipelines with cross-verification can achieve less than 1% false positive rates at scale, making agentic code review commercially viable.
- Constraints beat raw capability β AutoHarness shows that wrapping smaller models with auto-generated behavioral harnesses can outperform larger unconstrained models, suggesting the future of reliable agents is in environment design, not just model scaling.
- Agent infrastructure is maturing rapidly β Cloudflare /crawl, Context7 CLI, OpenAI Skills SDK, and Cursor Marketplace all signal a converging ecosystem of primitives designed specifically for agentic workflows.
- Custom silicon is accelerating β Meta shipping four MTIA chips in two years reflects the industry-wide push to build dedicated inference hardware for AI workloads.
Infographics

