AI Summary • Published on Apr 13, 2026
Current Explainable Artificial Intelligence (XAI) methods often produce technical outputs that are not easily understood by non-experts. While Large Language Models (LLMs) show promise in translating these technical outputs into natural language, existing approaches frequently lack mechanisms to guarantee the accuracy, faithfulness, and completeness of the generated explanations. Furthermore, evaluating these LLM-generated narratives often relies on subjective human judgments or post-hoc scoring, meaning there are no systematic safeguards to prevent potentially flawed or misleading explanations from reaching end-users. This gap highlights a critical need for a robust verification system within LLM-based XAI explanation generation.
The paper proposes a Two-Stage LLM Meta-Verification Framework designed to generate accessible and reliable natural-language explanations from XAI methods. This framework comprises three key components: an Explainer LLM, a Verifier LLM, and an iterative refeed mechanism. The Explainer LLM takes raw XAI outputs (e.g., feature importance scores, saliency maps) and transforms them into human-readable explanations, guided by a prompt template and utilizing a Zero-Shot Chain-of-Thought (CoT) prompting strategy to enhance reasoning. The Verifier LLM then evaluates these explanations based on faithfulness, logical coherence, completeness, and absence of hallucination risk. It uses a structured meta-prompting template to formalize the evaluation process and returns a standardized response with an accept-reject decision, justification, and error type. If an explanation is rejected, the iterative refeed mechanism injects the Verifier's feedback back into the Explainer's prompt, prompting it to revise and refine the explanation until it is accepted or a maximum number of iterations is reached.
Experiments conducted across five XAI techniques and datasets, using three families of open-weight LLMs, demonstrated the framework's effectiveness. The meta-verification significantly reduced the number of erroneous explanations reaching end-users, increasing accuracy from a baseline of 59–77.8% (Explainer-only) to 81.8–95.21% after verification. Notably, the combination of `gpt-oss:20b` as Explainer and `qwen3:30b` as Verifier yielded the highest end-to-end verification accuracy (95.21%). Structured meta-prompting was crucial for the Verifier's reliability, with performance degrading in simplified prompt configurations. The framework also substantially improved linguistic accessibility, with LLM-generated explanations achieving significantly better Flesch-Kincaid Reading Ease and Grade Level scores compared to raw XAI outputs. The iterative refeed mechanism proved robust, resolving most erroneous cases within one or two iterations. Analysis of the Entropy Production Rate (EPR) showed a monotonic decline during refinement, indicating that Verifier feedback progressively guided the Explainer toward more stable and coherent reasoning.
The Two-Stage LLM Meta-Verification Framework offers a practical and efficient pathway toward developing more trustworthy and democratized XAI systems by ensuring that natural-language explanations are not only accessible but also faithful and reliable. The findings highlight the critical role of systematic verification in mitigating inherent LLM limitations like hallucinations and reasoning failures when generating XAI explanations. Future work could explore advanced prompt engineering strategies, fine-tuning Explainers and Verifiers on curated datasets, extending the framework to multimodal foundation models for direct interpretation of visual XAI artifacts, and conducting dedicated user studies to evaluate subjective aspects like trust and perceived usefulness.