AI Summary • Published on Nov 27, 2025
The rapid evolution of generative AI models makes it increasingly difficult to distinguish between authentic and AI-generated images, posing significant threats to social trust and information integrity. Current explainable image forensic methods predominantly rely on post-hoc rationalizations or high-level visual anomaly detection, lacking a verifiable chain of evidence. This dependency on surface-level pattern matching limits causally grounded explanations and often results in poor generalization, highlighting a critical need for methods that provide structured, verifiable forensic analysis.
The authors introduce REVEAL-Bench, a new reasoning-enhanced multimodal benchmark for AI-generated image detection. This dataset is structured around a chain-of-evidence (CoE) derived from multiple lightweight expert models, recording step-by-step reasoning traces and evidential justifications. Building on this, they propose REVEAL (Reasoning-enhanced Forensic Evidence Analysis), a two-stage forensic framework. The first stage involves supervised fine-tuning (SFT) to teach the multimodal large language model (MLLM) the canonical CoE structure. The second stage uses R-GRPO (Reasoning-enhanced Group Relative Policy Optimization), an expert-grounded reinforcement learning algorithm with a novel composite reward mechanism. This reward jointly optimizes detection accuracy, explanation fidelity, and logical coherence by considering answer accuracy (`r_sem`), reasoning quality and structural integrity (`r_think`), and alignment with multi-view visual evidence (`r_view`). This approach forces the MLLM to perform logical synthesis over explicit forensic evidence rather than simple visual pattern matching.
Extensive experiments demonstrate that REVEAL significantly enhances detection accuracy, explanation fidelity, and robust cross-model generalization, setting a new state of the art for explainable image forensics. On the in-domain REVEAL-Bench dataset, REVEAL shows comparable performance to compact binary classifiers, but it excels in cross-domain generalization on the GenImage dataset, maintaining higher accuracy and stability. The study also shows that increasing MLLM size leads to improved detection capabilities, suggesting a scaling law for synthetic image detection. Ablation studies confirm the critical role of reasoning datasets and the R-GRPO method in improving performance. Furthermore, REVEAL demonstrates superior robustness against common post-processing distortions like Gaussian blur and JPEG compression compared to baseline methods.
REVEAL provides a significant advancement in explainable AI-generated image detection by moving beyond surface-level pattern matching to a system grounded in verifiable forensic evidence and logical reasoning. The introduction of REVEAL-Bench, the first dataset with expert-grounded chain-of-evidence annotations, facilitates the development of more transparent and generalizable forensic detectors. The REVEAL framework, with its innovative R-GRPO, offers a robust and adaptable solution for multimodal models, promising more reliable and interpretable AI-generated content detection. This work paves the way for future research in reasoning-based image forensics, addressing critical concerns about misinformation and information integrity in the era of advanced generative AI.