AI Summary • Published on Mar 22, 2026
Highway merging presents a significant and safety-critical challenge for autonomous vehicles due to dynamic interactions and inherent uncertainties. Traditional rule-based control methods often lack the adaptability required for complex merging situations, while fully centralized deep reinforcement learning (DRL) approaches face scalability issues. Decentralized policies that can selectively process relevant information offer a promising alternative. Current multi-agent reinforcement learning (MARL) solutions in the literature sometimes require substantial computational resources, highlighting a need for more efficient and focused decision-making strategies to improve safety and efficiency during highway merges.
The authors propose a novel multi-agent control framework built upon the QMIX architecture, enhanced with a partial attention mechanism for highway merging scenarios. The environment is modeled as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP). The partial attention mechanism operates on two levels: spatial attention, which limits each ego vehicle's observation to only the vehicle directly ahead and the one on the opposite merging road, and temporal attention, where the neural network learns to focus on the past states of these critical vehicles. Each agent's state representation integrates its own kinematic data with the historical kinematic states of these two neighboring vehicles. These historical sequences undergo layer normalization, are projected into an embedding space, and then processed by separate multi-head attention modules to capture temporal dependencies. A weighted aggregation emphasizes recent historical data, followed by feed-forward networks with residual connections. The enriched state vector is then fed into the QMIX utility network to estimate action values. A comprehensive reward structure is designed to balance individual agent objectives (e.g., desired velocity, fuel efficiency, comfort) with global traffic objectives (e.g., collision avoidance, traffic flow, waiting time minimization, goal completion), ensuring coordinated and efficient decision-making.
Simulations were conducted using the Simulation of Urban Mobility (SUMO) environment, with neural networks implemented and trained in PyTorch over 1000 episodes, each up to 1000 time steps. Training on an Apple M4 chip took approximately 56 minutes. During the training phase, the proposed model converged within 500 episodes, achieving higher average velocities and significantly reducing the number of collisions, although fuel consumption increased as speeds rose. An ablation study, comparing with a "Vanilla QMIX" (VQMIX) without temporal attention layers, showed that VQMIX diverged within 300 episodes, underscoring the vital role of temporal attention. In the evaluation phase, the Partial Attention QMIX model demonstrated clear improvements over SUMO's Intelligent Driving Model (IDM) across several metrics. It achieved higher average rewards and maintained greater average velocities, indicating smoother and more efficient vehicle movement. While fuel consumption was slightly higher, the number of collisions dramatically decreased, illustrating the effectiveness of the proposed method in enhancing safety and performance by carefully tailoring the information state to relevant dynamics.
This research successfully demonstrates the efficacy of integrating partial attention mechanisms into a QMIX framework for safe and efficient multi-agent control in highway merging. The combination of targeted spatial and temporal attention, coupled with a meticulously designed hybrid reward signal, significantly improves key performance indicators such as collision rates, average vehicle velocity, and overall reward compared to conventional driving models. Although the method leads to slightly higher fuel consumption due to increased speeds, this trade-off is accepted for enhanced safety and efficiency. Future work aims to expand this framework to more complex scenarios, including multi-lane highways and mixed-autonomy environments where autonomous vehicles interact with human-driven vehicles, further advancing the practical applicability of these intelligent control systems.