AI Summary • Published on Jan 21, 2026
The paper highlights that while artificial intelligence (AI) has advanced significantly across many research workflows, academic rebuttal remains a complex and underexplored challenge. Current AI approaches for rebuttal primarily rely on supervised fine-tuning, which mimics surface-level linguistic patterns but lacks the strategic, perspective-taking reasoning essential for effective persuasion. Academic rebuttal is framed as a dynamic game of incomplete information, where authors must persuade reviewers despite severe information asymmetry regarding reviewers' knowledge, biases, and concerns. This lack of "Theory of Mind" (ToM) capability in existing models leads to formulaic and shallow responses that fail to strategically address underlying critiques.
The authors introduce RebuttalAgent, the first framework to integrate Theory of Mind (ToM) into academic rebuttal, employing a novel ToM-Strategy-Response (TSR) pipeline. This pipeline involves three stages: 1) The Theory-of-Mind (T) stage conducts hierarchical analysis to understand the reviewer's macro-level intent (overall stance, attitude, dominant concern, expertise) and micro-level critique attributes (significance, methodology, experimental rigor, presentation) for each comment, building a multi-dimensional reviewer profile. 2) The Strategy (S) stage then uses this reviewer profile to formulate an actionable plan for responding to the target comment, ensuring alignment with both macro and micro critiques. This explicit strategy generation forces the model to decide how to respond before what to write. 3) The Response (R) stage generates the final persuasive response by synthesizing strategic inputs (ToM profile, strategy) and contextual inputs (retrieved relevant manuscript chunks). To train RebuttalAgent, they developed RebuttalBench, a large-scale dataset (over 70K samples) synthesized using a critique-and-refine pipeline with multiple teacher models. The training proceeds in two stages: supervised fine-tuning (SFT) for foundational capabilities, followed by reinforcement learning (RL) optimized by a novel self-reward mechanism that allows scalable self-improvement without an external reward model. For automated evaluation, they created Rebuttal-RM, a specialized reward model built on Qwen3-8B and trained on over 100K multi-source rebuttal samples, designed to align with human preferences.
Extensive experiments demonstrated RebuttalAgent's significant effectiveness. On automated metrics, RebuttalAgent substantially outperformed the base model (Qwen3-8B) by an average of 18.3%, with notable gains in persuasiveness and constructiveness (up to 34.6%). It also showed performance comparable to, and often surpassing, advanced proprietary models like GPT-4.1 and o3 across both automated and human evaluations. A human evaluation, involving three experienced annotators on 100 sampled comments, confirmed RebuttalAgent's clear superiority, achieving the highest average score of 9.57 and demonstrating a 7.36% improvement in persuasiveness over the GPT-4.1 baseline. The ablation study confirmed the necessity of all design components (ToM, Strategy, Thinking, SFT, RL), showing performance drops without them. The framework also generalized well to other backbone models like Llama-3.1-8B and Qwen3-4B, yielding significant gains. Rebuttal-RM, their automated evaluator, achieved a high scoring consistency with human preferences (average score of 0.812), outperforming GPT-4.1 by 9.0% in alignment.
The RebuttalAgent framework represents a significant step towards more effective human-AI collaboration in academic peer review by improving the clarity and constructive nature of academic dialogue. It provides a valuable tool for researchers, particularly junior scholars, to navigate the complex rebuttal process more effectively by offering strategic suggestions and assisting in drafting persuasive responses. The authors emphasize that the tool is intended to inspire and assist, not replace, critical human analysis. They also acknowledge limitations, such as potential bias reinforcement from training data and the model's inability to fabricate new experimental data. Future work includes continuing to refine the agent and foster a more open and constructive scientific world. The complete framework, including RebuttalBench and Rebuttal-RM, is intended to be fully reproducible, with code and models planned for public release.