AI Summary • Published on Jun 11, 2025
Reliable and detailed assessment of vehicle damage is crucial across various sectors, including auto insurance, fleet management, vehicle resale, and autonomous driving systems. While advanced instance segmentation models, such as ALBERT, offer high accuracy in detecting complex car parts and damage, their computational demands often hinder real-time deployment on edge devices or mobile platforms commonly used in insurance and roadside evaluations. Furthermore, existing methods frequently struggle with subtle visual cues like small dents, light scratches, or cracked paint, especially when distinguishing genuine damage from tampering or fake alterations.
This paper introduces SLICK (Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation), a lightweight yet high-performing instance segmentation model designed for rapid inference and practical application. SLICK is developed through a teacher-student distillation process, where ALBERT acts as a powerful teacher model, and SLICK learns to replicate its outputs with significantly reduced computational overhead, achieving up to 700% (7X) faster inference speed. The SLICK framework integrates several key components to address the challenges of accurate and fast automotive vision:
1. Selective Part Segmentation: This module uses a high-resolution semantic backbone, guided by structural priors, to precisely segment vehicle parts, even when they are occluded, deformed, or have lost paint.
2. Localization-Aware Attention: Dynamic spatial attention blocks are employed to concentrate computational resources on damaged or altered regions, thereby enhancing the detection of fine-grained damage in cluttered and complex environments.
3. Instance-Sensitive Refinement: An instance-sensitive refinement head utilizes panoptic cues and shape priors to differentiate between overlapping or adjacent parts (e.g., fender versus door) and ensure precise boundary alignment.
4. Cross-Channel Calibration: Through multi-scale channel attention, this component amplifies subtle damage signals, such as scratches and dents, while effectively suppressing noise like reflections and decals.
5. Knowledge Fusion Module: This module integrates diverse domain-specific knowledge from synthetic crash data, geometric priors, and real-world insurance datasets. This fusion enhances the model's generalization capabilities and its ability to handle rare damage scenarios effectively.
SLICK’s training employs a composite distillation loss that minimizes the discrepancy between the teacher and student models across mask alignment, class probabilities, feature maps, and graph relations, ensuring efficient knowledge transfer.
SLICK-V1 was rigorously evaluated on a substantial automotive inspection benchmark dataset, comprising over 1 million training images and a held-out test set of 9,981 high-quality annotated images covering 61 vehicle part and damage classes. The model demonstrated superior performance, achieving a Population Accuracy of 89.32%, a Precision of 81.23%, and a Recall of 89.22%. These metrics highlight SLICK’s ability to provide precise pixel-level segmentation, minimize false positives from visual clutter, and detect most true damage and part regions. Notably, SLICK achieved these results while reducing inference latency from 423 ms (teacher model) to 58 ms, marking a significant 7X acceleration. This performance confirms SLICK’s effectiveness in complex scenarios involving occlusions, paint variations, and diverse lighting conditions, matching or surpassing the teacher model in critical tasks like dent detection, scrape segmentation, and differentiating real from tampered damage.
SLICK represents a significant advancement in intelligent automotive inspection, offering an accurate, lightweight, and interpretable segmentation system suitable for deployment in various real-world applications such as insurance claim processing, rental car return inspections, and accident reporting. Its ability to achieve near-teacher performance with real-time execution on edge GPUs and mobile platforms, coupled with robust calibration under varying conditions, makes it highly practical. While SLICK demonstrates strong performance, the authors note limitations including potential degradation on out-of-distribution vehicle types or novel camera perspectives, challenges with ultra-fine damages due to low contrast and sensor noise, and reliance on the availability of high-quality structural priors and annotated graphs for its knowledge dependency. Future research aims to address these limitations by exploring multi-view inputs, incorporating 3D structural priors, and further optimizing for low-power devices.