πŸ€– AI Agents Weekly: Claude Code Review, AutoHarness, Perplexity Personal Computer, Cloudflare /crawl, Context7 CLI…

Β· 4 min read Β· Alex

πŸ€– AI Agents Weekly: Claude Code Review, AutoHarness, Perplexity Personal Computer, Cloudflare /crawl, Context7 CLI…

Original article: https://nlp.elvissaravia.com/p/ai-agents-weekly-claude-code-review

Processed: March 15, 2026 | Source: Elvis Saravia β€” AI Newsletter


⚠️ Note: The full article is paywalled. This summary is based on the free preview and table of contents.

Summary

This week’s AI Agents Weekly covers a broad range of agent tooling, infrastructure, and research developments across the industry.

In this issue:

  • Claude ships multi-agent Code Review
  • AutoHarness makes small agents beat large ones
  • Perplexity launches an always-on Personal Computer
  • Cloudflare ships a one-call /crawl endpoint
  • Context7 CLI brings docs to any agent
  • Andrew Ng launches Context Hub
  • Cursor Marketplace adds 30+ plugins
  • OpenAI shares Skills for Agents SDK
  • Google launches Gemini Embedding 2
  • Meta ships four MTIA chips in two years
  • Codex agent files taxes, catches $20K error

Top Stories

Claude Code Review

Anthropic launched Code Review for Claude Code β€” an automated system that dispatches multiple AI agents to examine every pull request. Instead of a single pass, parallel agents identify potential issues, verify findings to eliminate false positives, and rank bugs by severity, delivering a consolidated overview comment plus targeted inline annotations.

Key details:

  • Multi-agent architecture: The system operates in parallel β€” agents scan, verify, and prioritize issues independently, producing both a summary comment and inline code annotations for specific problems.
  • Scales with complexity: Review depth adjusts based on PR size. Large PRs (over 1,000 lines) received findings 84% of the time, averaging 7.5 issues per PR. Small PRs (under 50 lines) had findings 31% of the time.
  • High precision: Less than 1% of flagged issues were marked incorrect by Anthropic engineers, with the system catching production-critical bugs that appeared routine in diffs.
  • Pricing and access: Available now as a research preview for Team and Enterprise customers. Reviews average $15–25 per PR, billed on token usage, with configurable monthly caps and per-repo controls.

πŸ“Ž Blog


AutoHarness: Automated Agent Constraint Synthesis

Researchers introduced AutoHarness, a technique that lets LLMs automatically synthesize protective code harnesses around themselves, preventing illegal actions without human-written constraints. Instead of relying on larger, more expensive models, the approach uses iterative code refinement with environmental feedback to generate custom safeguards β€” making smaller models outperform bigger unconstrained ones.

Key details:

  • Massive illegal action problem: In a recent LLM chess competition, 78% of Gemini-2.5-Flash losses were attributed to illegal moves. AutoHarness eliminates this class of failure entirely by generating harnesses that enforce valid actions across 145 different TextArena games.
  • Small beats large: Gemini-2.5-Flash with a synthesized harness exceeded Gemini-2.5-Pro’s performance while reducing costs β€” demonstrating that proper constraints are more valuable than raw model scale for agent environments.
  • Zero-shot generalization: The technique extends beyond game-playing to generating full policies in code, eliminating runtime LLM decision-making entirely and achieving higher rewards than GPT-5.2-High on certain benchmarks.
  • Practical agent pattern: The core insight applies broadly to any agent deployment β€” rather than trusting a model to self-constrain, auto-generate a verified harness that makes illegal states unreachable, shifting safety from model behavior to environment design.

πŸ“Ž Paper


More This Week (Preview β€” Paywalled)

  • Perplexity Personal Computer β€” always-on AI computer product
  • Cloudflare /crawl β€” one-call endpoint for web crawling
  • Context7 CLI β€” brings documentation to any agent
  • Andrew Ng Context Hub β€” new platform from the AI pioneer
  • Cursor Marketplace β€” 30+ new plugins for the AI editor
  • OpenAI Skills for Agents SDK β€” new SDK capabilities
  • Google Gemini Embedding 2 β€” next-gen embedding model
  • Meta MTIA chips β€” four chips shipped in two years
  • Codex filing taxes β€” agent catches a $20K error

Key Takeaways

  1. Multi-agent verification is production-ready β€” Claude Code Review demonstrates that parallel agent pipelines with cross-verification can achieve less than 1% false positive rates at scale, making agentic code review commercially viable.
  2. Constraints beat raw capability β€” AutoHarness shows that wrapping smaller models with auto-generated behavioral harnesses can outperform larger unconstrained models, suggesting the future of reliable agents is in environment design, not just model scaling.
  3. Agent infrastructure is maturing rapidly β€” Cloudflare /crawl, Context7 CLI, OpenAI Skills SDK, and Cursor Marketplace all signal a converging ecosystem of primitives designed specifically for agentic workflows.
  4. Custom silicon is accelerating β€” Meta shipping four MTIA chips in two years reflects the industry-wide push to build dedicated inference hardware for AI workloads.

Infographics

Landscape Infographic

Portrait Infographic