AI Summary • Published on Dec 3, 2025
Large Language Models (LLMs) are evolving towards autonomous agents, but this transition is hindered by the lack of scalable infrastructure for constructing high-quality interaction signals. LLMs trained on static text corpora often act as "System 1" responders, lacking the "System 2" rigor required for complex planning and long-term reasoning, leading to myopic decision-making and an inability to generalize. Current approaches for building interactive environments are prohibitively expensive and offer limited diversity, restricting the behavioral space for agentic learning. Furthermore, agents trained on purely synthetic data often struggle with real-world execution due to a disconnect between "thought" and "action," resulting in issues like hallucinations in tool usage and a lack of robust error recovery. True agentic capability demands training on trajectories that capture the latency, stochasticity, and feedback loops inherent in real-world environments.
To address the challenges of scaling diverse and complex interactive environments for training agentic models, the authors introduce a comprehensive method built around a unified ecosystem comprising three orthogonal components: NexAU, NexA4A, and NexGAP. NexAU (Nex Agent Universe) is a flexible, high-throughput agent framework that abstracts away the complexities of agent features like execution loops, tools, and sub-agents. It supports building intricate agent hierarchies through simple configurations, unifying heterogeneous frameworks under a recursive, fractal execution model. NexA4A (Agent for Agent) is a generative system that automatically synthesizes diverse agent architectures and workflows from natural language specifications. It automates the creation of system prompts, sub-agents, tool selection, and MCP integration, enabling the large-scale creation of varied agent behaviors and interaction topologies. NexGAP (General Agent-data Pipeline) bridges the simulation–reality gap by integrating real-world Model Context Protocol (MCP) tools and information fusion to generate massive-scale, end-to-end trajectories rooted in authentic execution. This pipeline involves generating framework-construction queries, synthesizing tasks of varying difficulty, executing agents in NexAU to produce interaction traces, and normalizing these traces into multiple tool-call formats. NexGAP also incorporates web search augmentation for factual grounding, a supervisor for visual feedback and self-correction, and Quality Assessment Agents to filter out low-quality trajectories. This entire system is used to train Nex-N1, a series of models demonstrating robust generalization across diverse agent frameworks.
Extensive empirical evaluations demonstrate that Nex-N1 consistently outperforms state-of-the-art open-source models of comparable size and achieves competitive performance against frontier proprietary models on complex agentic tasks. On general agentic benchmarks like τ²-bench and GAIA 2, Nex-N1 shows robust general agentic abilities and effective operation in dynamic, interactive environments. For agentic coding tasks, Nex-N1 exhibits competitive performance on SWE-bench, Terminal-Bench, and BaxBench, even surpassing GPT-5 in tool use on some metrics. Furthermore, in practical human evaluations for end-to-end project development and webpage creation, Nex-N1 wins or ties against major models in a significant percentage of scenarios (e.g., 64.5% against claude-sonnet-4.5 on coding scenarios). The model also demonstrates strong deep-research capabilities, autonomously executing full research pipelines, generating high-quality reports, and producing visualized academic materials like presentations and posters. Nex-N1 also proves robust and practical, showing stable performance when evaluated across multiple agent frameworks such as OpenHands and Claude Code.
The Nex-N1 ecosystem and the models trained within it represent a significant advancement in enabling Large Language Models to evolve into truly autonomous agents. By systematically scaling the diversity and complexity of interactive environments through automated synthesis, this work addresses a critical bottleneck in current agentic AI research. The ability to generate high-quality, realistic, and grounded interaction trajectories provides a robust foundation for training agents that can excel in complex tool-use scenarios, perform long-term planning, and adapt to dynamic real-world feedback. Future work aims to evolve this infrastructure into a large-scale simulation platform for Reinforcement Learning, creating a "gym" where agents can self-improve and master long-horizon reasoning through active exploration. Open-sourcing the Nex ecosystem and model weights is expected to accelerate further community research and development in agentic AI.