All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 7 results for this tag.
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
This paper introduces Reward Forcing, a novel framework for efficient streaming video generation that tackles issues like diminished motion dynamics and over-reliance on initial frames. It achieves state-of-the-art performance by combining EMA-Sink for improved long-term context and Rewarded Distribution Matching Distillation (Re-DMD) to enhance motion quality.
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
The paper introduces a comprehensive method and ecosystem (NexAU, NexA4A, NexGAP) to overcome limitations in scaling interactive environments for training agentic Large Language Models (LLMs). This infrastructure enables the systematic generation of diverse, complex, and realistically grounded interaction trajectories for LLMs.
STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
This paper introduces Stage-Aware Reinforcement (StARe), a novel module that decomposes long-horizon robotic manipulation tasks into semantically meaningful stages, providing dense, interpretable reinforcement signals. Integrated into the Imitation → Preference → Interaction (IPI) fine-tuning pipeline, StARe significantly improves the performance and robustness of Vision-Language-Action (VLA) models on complex manipulation tasks.
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
This paper introduces SpaceTools, a vision-language model trained with Double Interactive Reinforcement Learning (DIRL) to achieve precise spatial reasoning and real-world robot manipulation by effectively coordinating multiple external tools. It demonstrates state-of-the-art performance on various spatial understanding benchmarks.
TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning
This paper introduces TempR1, a novel temporal-aware multi-task reinforcement learning framework designed to significantly enhance the temporal understanding capabilities of Multimodal Large Language Models (MLLMs). By integrating diverse temporal tasks and tailored reward functions, TempR1 achieves state-of-the-art performance across various video understanding benchmarks and improves generalization.
AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
AdaptVision introduces an efficient VLM paradigm that autonomously determines the minimum number of visual tokens required for each sample by employing a coarse-to-fine visual acquisition strategy, leading to superior performance with significantly reduced computational overhead.
Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks
This paper provides a comprehensive tutorial on enhancing Reinforcement Learning (RL) for wireless networks using Large Language Models (LLMs). It proposes a taxonomy for LLM roles in RL (state perceiver, reward designer, decision-maker, generator) and showcases their application in various wireless scenarios to address classical RL's limitations in generalization, interpretability, and sample efficiency.