AI Summary • Published on Apr 16, 2026
Lung cancer remains a leading cause of cancer-related mortality, and accurate diagnosis and subtype classification are crucial for effective treatment. While computed tomography (CT) imaging is essential for detection and staging, it struggles to differentiate benign from malignant lesions and lacks cellular-level detail, which is vital for definitive pathological classification. Conversely, histopathological examination of biopsy tissue, though the gold standard, is invasive and time-consuming. Existing AI systems often rely on a single imaging modality, limiting their diagnostic robustness and interpretability. Deep learning models, while powerful, often act as "black boxes," making it difficult for clinicians to understand their predictions and trust their decisions.
This study developed a dual-modal explainable artificial intelligence (AI) framework that integrates CT radiology and hematoxylin and eosin (H&E) microscopy for lung cancer diagnosis and subtype classification. The framework utilizes two independent convolutional neural network (CNN) branches: one for CT image analysis (CT-CNN) and another for H&E microscopic image analysis (H&E-CNN). Feature representations from each modality are then combined through a weighted decision-level fusion module. Crucially, clinical metadata, including patient age, sex, and smoking history, is incorporated to dynamically adjust the contribution of each modality during prediction, enhancing diagnostic robustness. The final output provides a probability distribution across five lung cancer subtypes (adenocarcinoma, squamous cell carcinoma, large cell carcinoma, small cell lung cancer) and normal tissue. To address the black-box nature of AI, explainable AI (XAI) techniques, including Grad-CAM, Grad-CAM++, Integrated Gradients, Occlusion, Saliency Maps, and SmoothGrad, were implemented to visualize and interpret the model's decisions, highlighting image regions that contribute most strongly to the predictions.
Quantitative evaluation demonstrated strong performance for the proposed dual-modal AI framework, outperforming single-modality baseline models. The CT-only model achieved an accuracy of 0.84 and an AUROC of 0.94, while the H&E-only model achieved an accuracy of 0.85 and an AUROC of 0.95. In contrast, the dual-modal fusion framework achieved the highest overall performance, with an accuracy of 0.87, an AUROC exceeding 0.97, and a macro F1-score of 0.88. Statistical analysis using DeLong's test confirmed a significant improvement in the dual-modal model compared to single-modality approaches (p < 0.05). The model showed high correct classification rates for SCLC (90%) and squamous cell carcinoma (88%), though performance for large cell carcinoma was slightly lower (78%). Explainable AI techniques were successful, with Grad-CAM++ achieving the highest faithfulness (insertion AUC ≈ 0.83 for H&E, 0.81 for CT) and localization accuracy (IoU ≈ 0.65 for H&E, 0.81 for CT), confirming strong correspondence with expert-annotated tumor regions.
The proposed dual-modal AI framework offers a promising direction for improving AI-assisted cancer diagnosis by bridging the diagnostic gap between non-invasive radiological screening and definitive pathological assessment. Its high diagnostic accuracy, coupled with the interpretability provided by XAI techniques, enhances both precision and transparency in lung cancer classification. This approach has strong potential for integration into real-world precision oncology and clinical decision support systems, offering clinicians a more robust and understandable tool for patient care. Future work will involve expanding the dataset, refining multimodal feature fusion strategies, and validating the framework in diverse clinical environments to assess its robustness and applicability further.