AI Agents of the Week – LLM Watch (Feb 22, 2026)

Main Thesis

This weekly research roundup from LLM Watch highlights five key areas where AI agent research is rapidly advancing: memory & continual learning, planning under uncertainty, multi-agent collaboration, trust & safety, and practical tooling.

Key Findings

🧠 Memory & Continual Learning

IntentCUA introduces intent-level representations that convert raw interaction traces into reusable skills.
Achieves a 74.83% task success rate with a Step Efficiency Ratio of 0.91 on desktop automation tasks.
Uses a coordinated Planner, Plan-Optimizer, and Critic sharing memory to stabilise long-horizon execution.

🗺️ Planning & Environment Interaction

AgentConductor uses reinforcement learning to evolve multi-agent communication topologies dynamically.
Delivers up to 14.6% improvement in pass@1 accuracy over baselines for code generation.
Density-aware layered DAG construction reduces token costs by 68% — a major efficiency win for compute-constrained deployments.

🤝 Multi-Agent Collaboration & Control

AgentConductor shows that adapting topology to task difficulty outperforms fixed communication graphs, with 13% density reductions alongside accuracy gains.
AutoNumerics applies multi-agent orchestration to scientific computing, autonomously designing and verifying PDE solvers across 24 canonical problems.
Key insight: the architecture of agent collaboration matters more than individual agent capability.

🔒 Trust, Verification & Safety

Wink is a production-deployed system for recovering from coding agent misbehaviours.
Found that ~30% of all agent trajectories contain misbehaviours: Specification Drift, Reasoning Problems, or Tool Call Failures.
Lightweight self-intervention resolves 90% of single-intervention misbehaviours and reduced engineer interventions in live A/B testing.
CowCorpus provides a taxonomy of human intervention patterns, enabling models to predict user interventions with a 61.4–63.4% improvement over baselines.

🛠️ Tools & Frameworks in Practice

How AI Coding Agents Communicate analyses pull requests across five AI coding agents.
Finds that presentation style correlates with reviewer engagement and merge outcomes — agents that communicate clearly get their PRs merged more often.

Practical Takeaways

Build for long horizons: Intent-level memory abstraction (IntentCUA) is a viable path to more reliable long-running agents.
Dynamic topology > static graphs: Fixed multi-agent communication structures leave significant performance and cost on the table.
Expect ~30% misbehaviour rates: Production agent systems need built-in recovery mechanisms, not just prevention.
Human-in-the-loop is predictable: Models can now anticipate when humans will intervene, enabling proactive agent self-correction.
Agent communication style matters: How an agent explains its work affects real-world outcomes like code review acceptance.

Infographic

Infographic wide