AI Summary • Published on Dec 2, 2025
This research addresses several critical challenges across two main areas: improving generative models, particularly Normalizing Flows (NFs), and applying these models to real-world computer vision problems. For Normalizing Flows, key issues include computational inefficiency, limited expressiveness, training instability caused by complex Jacobian determinant calculations, difficulties in interpretability and architectural flexibility, and challenges in scaling to high-dimensional data. In the realm of applications, the paper tackles problems such as class imbalance, data scarcity, and annotation costs in agricultural quality assessment; the absence of suitable public datasets and the need for effective anonymization in autonomous driving privacy; the high cost and practical limitations of traditional geological fieldwork, compounded by land cover interference and spectral variability in remote sensing; the time-consuming nature and diverse degradation types in art restoration; and the frequent absence or obscurity of traffic signs in autonomous vehicle contexts.
The thesis proposes several innovations to enhance the efficiency of Normalizing Flows, including novel invertible 3x3 convolution layers with mathematically derived invertibility conditions, an improved Quad-coupling layer for greater flexibility, and fast parallel algorithms for convolution inversion and backpropagation. It introduces "Inverse-Flow," a multi-scale architecture that utilizes inverse convolution for the forward pass and convolution for sampling, and "Affine-StableSR," an efficient super-resolution model leveraging pre-trained Stable Diffusion weights with affine-coupling layers. For real-world applications, the work develops an automated corn seed quality assessment system using Conditional GANs (BigGAN) for data augmentation and active learning. It proposes a privacy-preserving method for autonomous driving datasets, employing RetinaFace for face detection and Inpaint-Anything for generative inpainting of faces and license plates. An unsupervised machine learning framework combines stacked autoencoders for dimensionality reduction with k-means clustering to generate geological maps from multispectral remote sensing data. For art restoration, the StableSR diffusion model is adapted and fine-tuned to effectively handle various degradation types. Finally, for missing traffic sign detection, fine-tuned YOLOv8 models are used with robust data augmentation and regularization strategies.
The advancements in Normalizing Flows demonstrated significant improvements in computational efficiency; for instance, the proposed 3x3 invertible convolutions were roughly two times faster than emerging convolutions. Inverse-Flow achieved substantially faster sampling times (e.g., 12.2 on MNIST and 19.7±1.2 for Inv_Conv) and competitive forward pass times with considerably fewer parameters (e.g., 0.6M vs. 5.16M on MNIST). Affine-StableSR showed reduced model complexity and efficient encoding/decoding, yielding comparable or superior validation losses in super-resolution tasks. In applications, the agricultural quality assessment system achieved an improved classification accuracy of 79.24% with CGAN-generated images, increasing to 85.24% with active learning-labeled data. For autonomous driving privacy, custom YOLO models (yolov11l10, yolov11l19) recorded higher mAP scores (up to 0.70 mAP50-90 on anonymized data) and lower losses, while UNet (ResNet34) proved robust in segmentation tasks despite anonymization. Geological mapping using stacked autoencoders with Sentinel-2 data achieved the highest spatial resolution and accuracy, yielding improved Calinski-Harabasz and Davies-Bouldin scores. The fine-tuned StableSR model effectively restored degraded artworks, and for missing traffic signs, the IAMGROOT method achieved a top mAP of 0.90 for detection and a 0.605 Top-1 Accuracy for scene categorization, representing a 108% improvement over the baseline for categorization.
This research significantly advances the theoretical understanding and practical utility of generative models. The innovations in Normalizing Flows lead to more efficient, scalable, and expressive models, broadening their applicability in complex, high-dimensional data processing and real-time generative tasks. Furthermore, the successful application of these generative models to diverse computer vision problems provides practical, automated solutions for critical real-world challenges across various domains, including enhancing food security through automated quality control, protecting public privacy in autonomous driving scenarios, aiding resource exploration through advanced geological mapping, and preserving cultural heritage through art restoration. The frameworks and methodologies developed establish new benchmarks, stimulating further research into areas such as enhanced convolutional inversion models, more robust privacy-preserving techniques, the integration of multi-modal inputs for autonomous scene understanding, and specialized diffusion models for specific degradation types, thereby fostering continuous innovation in machine learning and its practical applications.