AI Summary • Published on Jan 29, 2026
Current autonomous driving (AD) systems face significant challenges in achieving efficient scalability across diverse vehicle configurations and environments. Existing research often focuses on specific vehicles and contexts, which hinders broad deployment across varying vehicle types, sensors, actuators, and critically, different traffic regulations, legal requirements, cultural dynamics, and ethical paradigms. While modular, service-oriented software architectures, increasingly incorporating AI in sub-modules like perception and planning, have improved performance, they raise concerns about safety and explainability. These issues are exacerbated in fully monolithic, AI-only architectures due to a lack of modular validation. A major limitation lies in Situation Awareness (SA), particularly in the comprehension of nuanced meanings (Level 2 SA) and the accurate projection of future events (Level 3 SA), which are essential for dealing with complex, emergent behaviors of other road users. Furthermore, a "semantic gap" exists, where the implied functionality of AD systems struggles with the vast "long-tail distribution" of real-world corner cases and unexpected situations (e.g., construction zones, interactions with emergency vehicles). The current approach of addressing these specific critical scenarios individually counteracts the necessary flexibility. The increasing interdependencies and circular reasoning within SA components also challenge traditional modular safeguarding methods.
The authors propose a conceptual framework centered around a "service-oriented modular end-to-end (SO-M-E2E)" architecture. This hybrid approach aims to combine the interpretability and debuggability of modular designs with the performance and generalization capabilities of end-to-end data-driven systems. A core component is a two-stage fine-tuning process for scalable adaptation. The first stage involves environment-specific fine-tuning using a country-specific reward model to adapt generic capabilities to local socio-political and environmental requirements. The second stage utilizes vehicle-specific transfer learning to adapt the system to particular vehicle types, sensors, and actuators, and to validate design decisions. To achieve human-like dynamic, context-specific flexibility, the paper emphasizes the amplified use of attention mechanisms across the entire information processing chain, allowing for flexible weighting and interconnection of individual modules at runtime. The framework also advocates for a consistent and continuously updated context representation, integrating both internal data and external context sources such as V2X communication (Cooperative Awareness Messages, Collective Perception Messages, Maneuver Coordination Messages) and high-definition maps. These external inputs serve as "virtual sensors" to enrich situational semantics and extend the vehicle's field of view. An automated scenario generation procedure is proposed to transfer critical situations across different Operational Design Domains (ODDs) and to artificially generate missing sensor data (e.g., synthetic LiDAR and radar for vision-only recordings), thereby creating ODD-specific master datasets. Finally, the methodology underscores the importance of a data-driven, iterative development, verification, and validation process throughout the system's lifecycle to address AI safety concerns and align with regulatory demands like the EU AI Act. This iterative process includes training a "P2T world-model" (Perception-to-Trajectory) for general driving behavior, followed by ODD-specific and vehicle-specific fine-tuning.
The paper's analysis reveals a convergence in autonomous driving (AD) architectures towards hybrid models like Modular End-to-End (M-E2E) and Interpretable End-to-End (I-E2E) systems. These emerging architectures strive to balance the advantages of both modularity (interpretability, safety) and end-to-end learning (performance, generalization). Foundation Models (FMs) are identified as a promising research area for enhancing generalization and situation assessment within AD, particularly through their potential for text-based scene interpretation and decision-making explanation. The discussion highlights that achieving human-like cognitive flexibility necessitates flexible, soft-coded connections and attention mechanisms, moving away from rigid, hard-coded rules. A gap is observed between current AD stack research, which often focuses on vision or LiDAR systems, and real-world commercial systems that integrate a broader array of internal sensors. Furthermore, V2X communication, while extensively researched, has not yet seen widespread adoption in commercial AD or many current research prototypes. The proposed SO-M-E2E architecture is presented as a concrete conceptualization for a self-responsible, interpretable, and debuggable AD system that addresses the limitations of current approaches and integrates a form of "world model" for human-like planning. This framework is shown to enable an efficient and scalable development process, facilitating the adaptation of a core model to various Operational Design Domains (ODDs) and vehicle configurations. The paper also suggests that centralized data pooling and collaborative development efforts across manufacturers and countries can lead to the creation of safer and more efficient autonomous driving systems, potentially utilizing federated learning approaches to protect proprietary data while sharing critical insights.
The pursuit of fully autonomous driving has several key implications. Firstly, AD systems must be designed with inherent flexibility to adapt to the diverse and dynamic global regulatory landscape, including complex requirements like those outlined in the EU AI Act concerning risk management, data governance, and human oversight. Secondly, ensuring AI safety in these critical applications demands a fundamental shift towards data-driven AI safety assurance methods, evolving beyond traditional functional safety standards. This necessitates robust iterative development and data management processes to effectively handle the vast array of real-world "corner cases." Thirdly, achieving widespread and scalable deployment requires human-like autonomy, meaning the ability of AD systems to adapt their functionalities—from perception to planning—to different Operational Design Domains (ODDs) and varying vehicle sensor/actuator setups. Fourthly, socio-political acceptance is crucial, calling for comprehensible, understandable, and publicly transparent development and validation processes, along with interpretable and ethically acceptable vehicle behaviors for all road users. Finally, the paper highlights significant opportunities for new research in areas such as generating, aggregating, and utilizing data for training cognitive orchestrators, integrating both bottom-up and top-down learning approaches, and leveraging large foundation models for internal planning and reasoning to achieve truly human-like AI in AD. From an economic perspective, centralized data pooling and collaborative development could reduce technical effort, foster innovation, and enable the creation of shared core autonomy systems that ensure consistent safety standards across different manufacturers and regions.