AI Summary • Published on Mar 5, 2026
Orofacial clefts (OCs) are common congenital abnormalities, but their accurate prenatal detection through ultrasound remains challenging, especially in early gestation due to subtle anatomical features. This difficulty is exacerbated by a scarcity of experienced specialists and the relative rarity of OCs, leading to significant variations in diagnostic accuracy among radiologists. This creates a two-fold problem: the immediate need for timely and accurate diagnosis, and the long-term challenge of developing clinical expertise in this specialized field, as training opportunities for rare conditions are limited. Current AI solutions for prenatal diagnosis often lack generalizability across different gestational ages or diverse clinical settings, and their potential educational utility for radiologists is largely unexplored.
This study developed an AI-assisted diagnostic system for orofacial clefts (AIOC) with a dual purpose: achieving expert-level diagnostic performance and facilitating radiologists' expertise development. A large-scale, multi-center dataset was constructed, comprising over 45,139 fetal ultrasound images from 9,215 fetuses across 22 hospitals, spanning gestational weeks 14 to 28. The AIOC system integrates a YOLOX-based detection branch for identifying key anatomical structures and a Mamba-Inspired Linear Attention (MILA) framework with an LSTM module for classification and modeling structural interrelationships. The system provides case-based diagnoses by analyzing specific views and targeted structures (e.g., cleft lip, alveolar ridge, palate).
The diagnostic performance of AIOC was evaluated through internal validation on the OC-6000 dataset and external validation on OC-GT3000 (18-28 weeks) and OC-Early (14-17 weeks) datasets. A reader study was conducted with senior and junior radiologists to compare AIOC's diagnostic accuracy, both independently and as a copilot for junior radiologists. To assess its educational utility, a four-cycle training-and-exam pilot study involved 24 radiologists and trainees, randomized into traditional training groups and AI-augmented training groups, evaluating learning retention and generalization of diagnostic skills.
The AIOC system demonstrated superior diagnostic performance, achieving an average AUC of 95.57%, F1 score of 94.34%, sensitivity of 93.67%, and specificity of 98.59% on the internal OC-6000 dataset. On the external OC-GT3000 dataset, it maintained robust performance with an AUC of 98.52%, sensitivity of 98.33%, and specificity of 98.99%, closely matching or exceeding the performance of senior radiologists and substantially outperforming junior radiologists. Even in early gestation (14-17 weeks, OC-Early dataset) without specific training data for that period, the model showed reasonable performance (AUC 93.06%, sensitivity 90.74%, specificity 95.37%).
When used as a medical copilot, AIOC significantly improved junior radiologists' diagnostic performance, raising their sensitivity by over 6.18% to 96.09% and achieving an AUC of 97.94%, comparable to senior radiologists. The system also reduced junior radiologists' diagnostic time by 6.62 seconds. In the medical education pilot, AI-augmented training groups consistently outperformed traditional training groups, showing improved diagnostic accuracy and learning retention for both trainees and junior radiologists across fixed and novel cases. Automation bias was low, with overreliance rates below 12%, indicating radiologists maintained critical judgment.
The AIOC system offers a scalable solution for improving both diagnostic accuracy and specialist training in fetal orofacial cleft detection. Its expert-level diagnostic performance, generalizability across diverse clinical scenarios and gestational ages, and ability to significantly enhance junior radiologists' skills suggest its potential for integration into routine clinical workflows. The structured diagnostic outputs, aligning with clinical decision-making, address a common limitation of AI tools in medical practice. Beyond immediate diagnostic assistance, the AIOC system serves as an effective educational framework, accelerating the development of clinical expertise for rare conditions, particularly valuable in resource-limited settings where training opportunities are scarce. While automation bias remains a concern, the study found low rates of overreliance. Future work includes expanding dataset diversity to cover more gestational ages, ethnicities, and OC subtypes, conducting prospective clinical validation, and further enhancing interpretability to promote trust and adoption in global healthcare.