All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 11 results for this tag.
Accelerated Online Risk-Averse Policy Evaluation in POMDPs with Theoretical Guarantees and Novel CVaR Bounds
This paper introduces a theoretical framework for accelerating the evaluation of Conditional Value-at-Risk (CVaR) value functions in Partially Observable Markov Decision Processes (POMDPs) with formal performance guarantees. It derives novel CVaR bounds for random variables, enabling faster policy evaluation through action elimination using simplified models.
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
This paper introduces RebuttalAgent, an AI framework that grounds academic rebuttal in Theory of Mind (ToM) to generate strategic and persuasive responses. It proposes a ToM-Strategy-Response (TSR) pipeline, supported by a large-scale synthetic dataset (RebuttalBench) and a specialized evaluation model (Rebuttal-RM), significantly outperforming existing models in automated and human evaluations.
MonoRace: Winning Champion-Level Drone Racing with Robust Monocular AI
MonoRace is an autonomous drone racing system that utilizes a monocular camera and IMU to achieve champion-level performance, notably winning the A2RL 2025 competition. It features robust state estimation combining neural-network-based gate segmentation with a drone model, an offline optimization procedure, and a neural network for guidance and control.
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
The paper introduces a comprehensive method and ecosystem (NexAU, NexA4A, NexGAP) to overcome limitations in scaling interactive environments for training agentic Large Language Models (LLMs). This infrastructure enables the systematic generation of diverse, complex, and realistically grounded interaction trajectories for LLMs.
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
This paper introduces Reward Forcing, a novel framework for efficient streaming video generation that tackles issues like diminished motion dynamics and over-reliance on initial frames. It achieves state-of-the-art performance by combining EMA-Sink for improved long-term context and Rewarded Distribution Matching Distillation (Re-DMD) to enhance motion quality.
STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
This paper introduces Stage-Aware Reinforcement (StARe), a novel module that decomposes long-horizon robotic manipulation tasks into semantically meaningful stages, providing dense, interpretable reinforcement signals. Integrated into the Imitation → Preference → Interaction (IPI) fine-tuning pipeline, StARe significantly improves the performance and robustness of Vision-Language-Action (VLA) models on complex manipulation tasks.
Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks
This paper provides a comprehensive tutorial on enhancing Reinforcement Learning (RL) for wireless networks using Large Language Models (LLMs). It proposes a taxonomy for LLM roles in RL (state perceiver, reward designer, decision-maker, generator) and showcases their application in various wireless scenarios to address classical RL's limitations in generalization, interpretability, and sample efficiency.
AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
AdaptVision introduces an efficient VLM paradigm that autonomously determines the minimum number of visual tokens required for each sample by employing a coarse-to-fine visual acquisition strategy, leading to superior performance with significantly reduced computational overhead.
TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning
This paper introduces TempR1, a novel temporal-aware multi-task reinforcement learning framework designed to significantly enhance the temporal understanding capabilities of Multimodal Large Language Models (MLLMs). By integrating diverse temporal tasks and tailored reward functions, TempR1 achieves state-of-the-art performance across various video understanding benchmarks and improves generalization.
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
This paper introduces SpaceTools, a vision-language model trained with Double Interactive Reinforcement Learning (DIRL) to achieve precise spatial reasoning and real-world robot manipulation by effectively coordinating multiple external tools. It demonstrates state-of-the-art performance on various spatial understanding benchmarks.