Articles tagged with: Reinforcement Learning

Showing 10 results for this tag.

Advanced·Jan 21, 2026

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

This paper introduces RebuttalAgent, an AI framework that grounds academic rebuttal in Theory of Mind (ToM) to generate strategic and persuasive responses. It proposes a ToM-Strategy-Response (TSR) pipeline, supported by a large-scale synthetic dataset (RebuttalBench) and a specialized evaluation model (Rebuttal-RM), significantly outperforming existing models in automated and human evaluations.

Natural Language Processing

Reinforcement Learning

Theory of Mind

Advanced·Jan 20, 2026

MonoRace: Winning Champion-Level Drone Racing with Robust Monocular AI

MonoRace is an autonomous drone racing system that utilizes a monocular camera and IMU to achieve champion-level performance, notably winning the A2RL 2025 competition. It features robust state estimation combining neural-network-based gate segmentation with a drone model, an offline optimization procedure, and a neural network for guidance and control.

Robotics

Computer Vision

Reinforcement Learning

Advanced·Dec 3, 2025

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

The paper introduces a comprehensive method and ecosystem (NexAU, NexA4A, NexGAP) to overcome limitations in scaling interactive environments for training agentic Large Language Models (LLMs). This infrastructure enables the systematic generation of diverse, complex, and realistically grounded interaction trajectories for LLMs.

Environment Generation

Interactive Environments

Environment Simulation

Agent Systems

Reinforcement Learning

Large Language Models

Agentic AI

Autonomous Agents

Advanced·Dec 3, 2025

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

This paper introduces Reward Forcing, a novel framework for efficient streaming video generation that tackles issues like diminished motion dynamics and over-reliance on initial frames. It achieves state-of-the-art performance by combining EMA-Sink for improved long-term context and Rewarded Distribution Matching Distillation (Re-DMD) to enhance motion quality.

Diffusion Models

Video Generation

Reinforcement Learning

Advanced·Dec 3, 2025

STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

This paper introduces Stage-Aware Reinforcement (StARe), a novel module that decomposes long-horizon robotic manipulation tasks into semantically meaningful stages, providing dense, interpretable reinforcement signals. Integrated into the Imitation → Preference → Interaction (IPI) fine-tuning pipeline, StARe significantly improves the performance and robustness of Vision-Language-Action (VLA) models on complex manipulation tasks.

Vision-Language Models

Reinforcement Learning

Robotics

Intermediate·Dec 2, 2025

Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks

This paper provides a comprehensive tutorial on enhancing Reinforcement Learning (RL) for wireless networks using Large Language Models (LLMs). It proposes a taxonomy for LLM roles in RL (state perceiver, reward designer, decision-maker, generator) and showcases their application in various wireless scenarios to address classical RL's limitations in generalization, interpretability, and sample efficiency.

Reinforcement Learning

Wireless Networks

Large Language Models

Advanced·Dec 2, 2025

AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

AdaptVision introduces an efficient VLM paradigm that autonomously determines the minimum number of visual tokens required for each sample by employing a coarse-to-fine visual acquisition strategy, leading to superior performance with significantly reduced computational overhead.

Computational Efficiency

Vision-Language Models

Reinforcement Learning

Advanced·Dec 2, 2025

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning

This paper introduces TempR1, a novel temporal-aware multi-task reinforcement learning framework designed to significantly enhance the temporal understanding capabilities of Multimodal Large Language Models (MLLMs). By integrating diverse temporal tasks and tailored reward functions, TempR1 achieves state-of-the-art performance across various video understanding benchmarks and improves generalization.

Multimodal Large Language Models

Temporal Understanding

Reinforcement Learning

Advanced·Dec 2, 2025

SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

This paper introduces SpaceTools, a vision-language model trained with Double Interactive Reinforcement Learning (DIRL) to achieve precise spatial reasoning and real-world robot manipulation by effectively coordinating multiple external tools. It demonstrates state-of-the-art performance on various spatial understanding benchmarks.

Spatial Reasoning

Vision-Language Models

Reinforcement Learning

Advanced·Nov 18, 2025

The Impact of Quantization on Large Reasoning Model Reinforcement Learning

This study investigates how quantization affects reinforcement learning in large reasoning models, finding that post-training quantization and QLoRA outperform quantization-aware training.

Quantization

Reinforcement Learning

Large Language Models

Research Guy

All Tags

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

All Tags

Research Guy

Research Guy

Articles tagged with: Reinforcement Learning