AI Summary • Published on Apr 26, 2026
Current hardware-aware Neural Architecture Search (NAS) pipelines often optimize deep neural networks under full-precision assumptions, applying low-precision adaptation only after the search is complete. This creates a significant mismatch between the optimization-time behavior and the deployment-time execution on low-precision edge accelerators, leading to substantial accuracy degradation when models are deployed. This issue is particularly critical for spaceborne edge AI, where strict constraints on power, memory, and processing throughput necessitate efficient and robust low-precision models for tasks like on-board Earth Observation data processing.
This work addresses the problem by integrating deployment-aligned low-precision effects directly into the hardware-aware NAS evaluation loop. Instead of post-training precision conversion, candidate architectures are exposed to FP16 numerical constraints during fine-tuning and on-device evaluation using an Intel Movidius Myriad X VPU. This allows the NAS process, utilizing a population-based evolutionary strategy with a hardware-aware fitness function, to jointly optimize architectural efficiency and numerical robustness. Specifically, FP16-aware projections are injected into the forward pass during fine-tuning to simulate reduced mantissa precision and rounding, ensuring that the search favors architectures inherently robust to deployment conditions without altering the search space or evolutionary strategy.
The proposed deployment-aligned low-precision training framework demonstrated significant improvements in on-device accuracy compared to post-training precision conversion. For a vessel segmentation task, while post-training precision conversion reduced on-device performance from 0.85 to 0.78 mIoU, the deployment-aligned low-precision training achieved 0.826 mIoU on-device for the same architecture (95,791 parameters). This recovered approximately two-thirds of the accuracy gap induced by deployment without increasing model complexity. Qualitative analysis also showed that models optimized with deployment-aligned training preserved vessel morphology more reliably, reducing artifacts like fragmented contours and improving recall for small targets.
This research highlights that for reliable edge deployment, particularly in critical applications like spaceborne AI, architectural efficiency alone is insufficient if numerical behavior is overlooked during optimization. Integrating deployment-consistent precision constraints directly into NAS pipelines is crucial for developing robust and efficient models that perform well in real-world, resource-constrained environments. Future work aims to extend this approach to integer-only and mixed-precision accelerators, and to unify numerical robustness, energy efficiency, and latency within a comprehensive hardware-aware NAS framework.