AI Summary • Published on Sep 27, 2025
Existing automatic car damage detection methods primarily rely on 2D image analysis, which limits the comprehensive understanding and accurate geometrical representation of damage. While 3D reconstruction offers a more detailed perspective, current 3D segmentation techniques often require extensive training, scene-specific optimization, or multi-view consistency, which can be impractical for subtle damage visible only from a single viewpoint. The lack of large-scale labeled 3D datasets also poses a significant challenge for developing robust 3D-based solutions.
CrashSplat proposes an end-to-end pipeline for 3D vehicle damage segmentation by lifting 2D masks. The process begins with predicting 2D damage masks using an instance segmentation model, specifically a YOLOv11 network trained on datasets like CarDD and VehiDE. Concurrently, Structure from Motion (SfM) via COLMAP is used on multiple vehicle images to compute camera parameters and generate a sparse point cloud. This point cloud then initializes a 3D Gaussian Splatting (3D-GS) reconstruction of the vehicle. The core contribution is a learning-free, single-view 3D-GS segmentation method: Gaussians are projected onto the image plane using the computed camera parameters. Those falling within the detected 2D mask are then processed using a Z-buffering-inspired algorithm, sorting Gaussians by depth and cumulating their weights. Further refinement involves statistical filtering based on the normal distribution of depth and opacity to remove outliers and noise, ensuring a robust and consistent 3D damage mask.
Experiments demonstrated that CrashSplat's single-view 3D segmentation approach is robust and effective, particularly for challenging scenarios where multi-view consistency is difficult to achieve. On 2D instance segmentation, YOLOv11-l and YOLOv11-x models achieved competitive mAP metrics on CarDD and VehiDE datasets, with YOLOv11-l being preferred for its efficiency. When comparing 3D segmentation, CrashSplat's single-view method yielded results comparable to, and in some cases outperformed, existing multi-view and training-based 3D segmentation approaches on public datasets like SPIn-NeRF and the "Truck" scene from Tanks and Temples. The method achieved sub-second CPU latency per instance for the segmentation process, highlighting its computational efficiency and avoidance of heavy preprocessing or reliance on large foundation models. Qualitative results showed consistent and visually high-quality 3D masks from multiple viewpoints despite using only a single input view for segmentation.
The CrashSplat solution has significant practical implications for real-world applications such as auto insurance, car repair shops, and online car sales, enabling more accurate and interactive damage assessments. By providing a superior geometrical representation of damage, it can improve visualization and analysis processes. However, the approach has limitations stemming from the quality of the initial 2D instance segmentation network, which can be affected by shadows, reflections, and the scarcity of high-quality, large-scale labeled damage datasets. Furthermore, the quality of the 3D-GS reconstruction inherently impacts the final segmentation, especially on challenging surfaces like glass. Future work includes addressing these dataset limitations, potentially through synthetic data generation, expanding evaluation to more diverse datasets, comparing against a broader range of 3D-GS baselines, and optimizing the Z-buffering algorithm for multi-threaded processing to further reduce runtime.