Complete Bibliographic Citation

Zhang, Xing, Yanwei Cui, Guanghui Wang, Wei Qiu, Ziyuan Li, Fangwei Han, Yajing Huang, Hengzhi Qiu, Bing Zhu, and Peiyang He (2026). “Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution.” ICLR 2026 Workshop on MALGAI (Multi-Agent Learning: Generalization and Adaptation in Intelligence). arXiv:2603.11445. Available: https://arxiv.org/html/2603.11445


Summary

This paper introduces VMAO (Verified Multi-Agent Orchestration), a system for coordinating specialized LLM-based agents through a verification-driven iterative loop. It provides academic grounding for Multi-Agent-Orchestration patterns, demonstrating how complex queries can be decomposed, executed in parallel, and iteratively improved through verification.

The Core Framework

VMAO implements the Plan-Execute-Verify-Replan cycle across five phases:

  • Plan: A QueryPlanner decomposes complex queries into directed acyclic graphs (DAGs) of sub-questions, assigning each to domain-specific agents
  • Execute: A DAGExecutor respects dependencies while maximizing parallel execution (default batch size of 3)
  • Verify: A ResultVerifier evaluates completeness using LLM-based evaluation, producing scores (0–1 scale), identifying missing aspects, and issuing recommendations
  • Replan: An AdaptiveReplanner addresses gaps through retries or new queries, preserving previous results
  • Synthesize: Hierarchical synthesis groups results by agent type before integrating into final answers with source attribution

Agent Architecture and Isolation

The system organizes agents into three functional tiers, illustrating Sub-Agents-Context-Isolation principles:

  • Tier 1 (Data Gathering): RAG, web search, financial, and competitor agents
  • Tier 2 (Analysis): Analysis, reasoning, and raw data processing agents
  • Tier 3 (Output): Document generation and visualization agents

The system deploys 42 unique tools across 8 microservices via Model Context Protocol (MCP). Separation of verification from execution is an instance of Dual-Agent-Design — using independent evaluators rather than self-assessment.

Stopping and Safety Mechanisms

The configurable stop conditions demonstrate Back-Pressure-Mechanisms in practice:

  • Completeness threshold: 80%
  • Confidence with partial coverage: 75% confidence + 50% complete
  • Diminishing returns: <5% improvement per iteration
  • Token budget: 1M tokens maximum
  • Iteration cap: 3 replanning cycles

Experimental Results

Tested on 25 expert-curated market research queries across four categories (Performance Analysis, Competitive Intelligence, Financial Investigation, Strategic Assessment):

MethodCompletenessSource QualityAvg Tokens
Single-Agent3.12.6100K
Static Pipeline3.53.2350K
VMAO4.24.1850K

Key findings:

  • +35% completeness and +58% source quality over single-agent baseline
  • Largest gains on open-ended Strategic Assessment queries (+53% completeness)
  • 8.5× token cost reflects thoroughness required for complex synthesis tasks

Implementation

  • Orchestration: LangGraph with Strands Agent framework on AWS Bedrock
  • Models: Claude Sonnet 4.5 for execution; Claude Opus 4.5 for independent verification
  • Safety: Tool call limiters (max 10 consecutive same-tool, 50 total), per-execution timeouts (600s), token tracking

Limitations

  • Modest evaluation set (25 queries only)
  • LLM-based verification cannot independently establish factual accuracy
  • Framework tested only with Claude model family

Key Concepts Extracted

Sources

  • Zhang, Xing, Yanwei Cui, Guanghui Wang, Wei Qiu, Ziyuan Li, Fangwei Han, Yajing Huang, Hengzhi Qiu, Bing Zhu, and Peiyang He (2026). “Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution.” ICLR 2026 Workshop on MALGAI (Multi-Agent Learning: Generalization and Adaptation in Intelligence). arXiv:2603.11445. Available: https://arxiv.org/html/2603.11445
    • Primary source: full system description, experimental setup, and results

Fair Use Notice

This note contains summaries and analysis of copyrighted material for educational and commentary purposes. This constitutes fair use/fair dealing under copyright law. The original work remains the property of its copyright holders. Full citation provided above.

Note

This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.