AI Agents of the Week: Papers You...

· 2 min read · Alex

AI Agents of the Week: Papers You...

Read the original article

AI Agents of the Week – LLM Watch (Feb 22, 2026)

Main Thesis

This weekly research roundup from LLM Watch highlights five key areas where AI agent research is rapidly advancing: memory & continual learning, planning under uncertainty, multi-agent collaboration, trust & safety, and practical tooling.


Key Findings

🧠 Memory & Continual Learning

  • IntentCUA introduces intent-level representations that convert raw interaction traces into reusable skills.
  • Achieves a 74.83% task success rate with a Step Efficiency Ratio of 0.91 on desktop automation tasks.
  • Uses a coordinated Planner, Plan-Optimizer, and Critic sharing memory to stabilise long-horizon execution.

🗺️ Planning & Environment Interaction

  • AgentConductor uses reinforcement learning to evolve multi-agent communication topologies dynamically.
  • Delivers up to 14.6% improvement in pass@1 accuracy over baselines for code generation.
  • Density-aware layered DAG construction reduces token costs by 68% — a major efficiency win for compute-constrained deployments.

🤝 Multi-Agent Collaboration & Control

  • AgentConductor shows that adapting topology to task difficulty outperforms fixed communication graphs, with 13% density reductions alongside accuracy gains.
  • AutoNumerics applies multi-agent orchestration to scientific computing, autonomously designing and verifying PDE solvers across 24 canonical problems.
  • Key insight: the architecture of agent collaboration matters more than individual agent capability.

🔒 Trust, Verification & Safety

  • Wink is a production-deployed system for recovering from coding agent misbehaviours.
  • Found that ~30% of all agent trajectories contain misbehaviours: Specification Drift, Reasoning Problems, or Tool Call Failures.
  • Lightweight self-intervention resolves 90% of single-intervention misbehaviours and reduced engineer interventions in live A/B testing.
  • CowCorpus provides a taxonomy of human intervention patterns, enabling models to predict user interventions with a 61.4–63.4% improvement over baselines.

🛠️ Tools & Frameworks in Practice

  • How AI Coding Agents Communicate analyses pull requests across five AI coding agents.
  • Finds that presentation style correlates with reviewer engagement and merge outcomes — agents that communicate clearly get their PRs merged more often.

Practical Takeaways

  • Build for long horizons: Intent-level memory abstraction (IntentCUA) is a viable path to more reliable long-running agents.
  • Dynamic topology > static graphs: Fixed multi-agent communication structures leave significant performance and cost on the table.
  • Expect ~30% misbehaviour rates: Production agent systems need built-in recovery mechanisms, not just prevention.
  • Human-in-the-loop is predictable: Models can now anticipate when humans will intervene, enabling proactive agent self-correction.
  • Agent communication style matters: How an agent explains its work affects real-world outcomes like code review acceptance.

Infographic

Infographic wide