AI Summary • Published on Jan 15, 2026
Enabling heterogeneous robot teams to reliably execute complex, long-horizon tasks based on high-level natural language instructions remains a significant challenge in embodied AI. While Large Language Models (LLMs) offer promise for instruction parsing and initial planning, they often struggle with extended reasoning and dynamic coordination among multiple robots. Traditional multi-robot planning methods lack the flexibility and scalability needed for intricate tasks in dynamic environments. Existing LLM-based approaches for multi-agent planning frequently falter in handling long-horizon reasoning, complex task dependencies, and robust integration of semantic understanding with formal planning and reactive control. This results in systems with limited autonomy, poor fault tolerance, and rigid collaboration mechanisms that cannot adapt to dynamic team sizes or complex synchronization requirements.
H-AIM (Hierarchical Autonomous Intelligent Multi-Robot Planning) addresses these challenges through a three-stage cascaded architecture that orchestrates LLMs, PDDL-based symbolic planning, and Behavior Trees. The framework begins with the PDDL File Generator (PFG), which leverages an LLM to parse high-level instructions, decompose complex tasks into atomic sub-tasks, and intelligently allocate them to suitable robots based on their capabilities, while optimizing for parallelism. The PFG then translates these into formal PDDL problem descriptions. Next, the Hybrid Planner (HP) first uses an LLM for semantic validation and simplification of the PDDL problems. It then employs a classical planner, FastDownward, to generate optimal action sequences for each sub-task. Crucially, a few-shot prompted LLM acts as a semantic coordinator to merge these sub-plans into a globally consistent, conflict-free overall plan, resolving temporal and resource conflicts. Finally, the Behavior Tree Compiler (BTC) transforms this unified plan into a parallel Behavior Tree, designed for high fault tolerance and reactive control. Each robot receives a sub-behavior tree structured with "Precondition-Execution-Validation" logic, including recovery and retry mechanisms for robustness. A shared blackboard mechanism facilitates communication and state synchronization, ensuring dynamic multi-robot coordination.
The H-AIM framework was evaluated using the newly introduced MACE-THOR benchmark dataset, which comprises 42 complex household tasks across 8 layouts in the AI2-THOR simulation environment. The dataset includes both Parallel-Independent and Temporal-Dependent tasks. H-AIM, specifically with GPT-4o, significantly outperformed the strongest baseline, LaMMA-P. The aggregate task success rate (SR) improved from 12% to 55%, and the goal condition recall (GCR) increased from 32% to 72%. For Parallel-Independent tasks, H-AIM achieved an SR of 0.71 and GCR of 0.88. For Temporal-Dependent tasks, it attained an SR of 0.38 and GCR of 0.62, demonstrating its effectiveness in complex collaborative scenarios. An ablation study confirmed the critical role of each component (PFG, HP, BTC) in achieving optimal performance, with the HP being particularly vital for temporal-dependent tasks and the BTC for execution robustness. The study also highlighted that the overall performance ceiling of H-AIM is influenced by the reasoning capabilities of the underlying LLM.
H-AIM presents a significant advancement in hierarchical multi-robot planning, offering an end-to-end solution that effectively bridges high-level human instructions with robust low-level robot execution. By synergizing LLM semantic understanding, formal PDDL planning, and reactive Behavior Tree control, the framework achieves notable improvements in task success rates and collaborative robustness for heterogeneous robot teams. The shared blackboard mechanism enables adaptive coordination in dynamic environments. While H-AIM demonstrates promising results in simulation under fully observable conditions, future work aims to extend its capabilities by integrating Visual Language Models for partial observability and developing adaptive re-planning mechanisms to handle dynamic real-world scenarios, further enhancing the autonomy and adaptability of multi-robot systems.