AI Summary • Published on Dec 28, 2025
As human-robot interactions become more prevalent, there is a growing need for robots to exhibit human-like understanding and provide transparent explanations for their behavior. While Theory of Mind (ToM) in Human-Robot Interaction (HRI) aims to enable robots to infer and adapt to human mental states, and Explainable Artificial Intelligence (XAI) focuses on making AI systems interpretable, current ToM research often lacks systematic evaluation of explanation quality. Specifically, there's a critical gap where existing ToM methods rarely assess whether the explanations provided to users accurately reflect the robot's actual internal reasoning, thereby risking misleading users and hindering effective human-AI collaboration.
The authors propose considering Theory of Mind (ToM) as a form of Explainable Artificial Intelligence (XAI) and systematically evaluated recent ToM studies in Human-Robot Interaction (HRI) using the eValuation XAI (VXAI) framework. This framework comprises seven desiderata: Parsimony, Plausibility, Coverage, Fidelity, Continuity, Consistency, and Efficiency. Each ToM study was assessed based on whether it explicitly addressed these desiderata. For instance, human evaluations were considered to satisfy Parsimony and Plausibility, reporting successful versus failed interactions addressed Coverage, examining the model's internal reasoning addressed Fidelity, and involving at least 100 participants addressed Continuity and Consistency. This rigorous evaluation aimed to identify limitations in current ToM assessment practices regarding explanation fidelity and user-centered perspectives.
The evaluation of state-of-the-art Theory of Mind (ToM) studies in Human-Robot Interaction (HRI) using the eValuation XAI (VXAI) framework revealed several key findings. All reviewed ToM studies satisfied the Parsimony and Plausibility desiderata, indicating that they conducted user-centered experiments and provided explanations perceived as believable. However, only two studies met the Continuity and Consistency criteria, largely due to insufficient participant numbers (fewer than 100). Crucially, none of the studies satisfied the Coverage desideratum, as they did not report the number of successful versus unsuccessful interactions. Furthermore, none addressed the Fidelity desideratum, meaning they did not examine the internal reasoning process of the robot's model. This highlights a significant gap: current ToM explanations often lack verifiable alignment with the robot’s actual internal decision-making, posing a risk of user misunderstanding or misguidance.
The findings imply a critical need for integrating Theory of Mind (ToM) within Explainable Artificial Intelligence (XAI) frameworks to enhance the rigor and comprehensiveness of explanations in Human-Robot Interaction (HRI). By combining ToM’s user-centered perspective with XAI’s technical emphasis on fidelity, future research can develop explanations that are both understandable to users and genuinely reflective of the robot’s internal reasoning. This shift in focus would enable more systematic evaluations encompassing both model fidelity and user comprehension. Future work could explore incorporating techniques such as behavior trees or explainable reinforcement learning (XRL) into ToM-based systems to support adaptive reasoning and further improve the fidelity of explanations, ultimately fostering more transparent, trustworthy, and effective human-robot collaboration.