Articles tagged with: Multimodal Large Language Models

Showing 3 results for this tag.

Advanced·Dec 3, 2025

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark

MLLMs often lack transparent reasoning, merely providing final predictions without intermediate steps or visual evidence. This paper introduces the Visual Reasoning Tracer (VRT) task and associated benchmarks (VRT-Bench, VRT-80k) to explicitly require models to localize intermediate objects in their reasoning paths, significantly enhancing model interpretability and reliability.

Multimodal Large Language Models

Visual Reasoning

Interpretability

Computer Vision

Advanced·Dec 3, 2025

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

DraCo introduces a novel interleaved reasoning paradigm, Draft-as-CoT, for text-to-image generation that leverages both textual and visual content. This approach addresses limitations of existing methods by generating low-resolution draft images for visual planning and verification, significantly improving the generation of rare attribute combinations and overall image quality.

Multimodal AI

Text-to-Image

Multimodal Large Language Models

Text-to-Image Generation

Chain-of-Thought

Multimodal LLMs

Machine Learning

Advanced·Dec 2, 2025

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning

This paper introduces TempR1, a novel temporal-aware multi-task reinforcement learning framework designed to significantly enhance the temporal understanding capabilities of Multimodal Large Language Models (MLLMs). By integrating diverse temporal tasks and tailored reward functions, TempR1 achieves state-of-the-art performance across various video understanding benchmarks and improves generalization.

Multimodal Large Language Models

Temporal Understanding

Reinforcement Learning

Research Guy

All Tags

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

All Tags

Research Guy

Research Guy

Articles tagged with: Multimodal Large Language Models