AI Summary • Published on Oct 17, 2025
Photorealistic fake images produced by Generative Adversarial Networks (GANs) are becoming increasingly difficult for humans to distinguish from real photographs, posing significant challenges for image forensics and maintaining digital content authenticity. While state-of-the-art detection systems exist, those relying on spatial-domain analysis often struggle with generalization across different GAN models and can be misled by superficial cues. This highlights a critical need for robust and universally applicable detection pipelines that can reliably differentiate authentic images from GAN-generated fakes, especially given the potential for misuse in areas like misinformation and fraud. This study aims to address this challenge by investigating the frequency domain as a source of stable and consistent "fingerprints" left by GANs.
The proposed methodology involves transforming images into the frequency domain using a two-dimensional Discrete Fourier Transform (2D DFT). Following this, a `fftshift` operation centers the low-frequency components, and a logarithmic transformation is applied to compress the wide dynamic range of frequency magnitudes, making subtle patterns more visible. The resulting log-scaled values are then normalized to a [0,1] range, creating a three-channel frequency representation that retains the original spatial dimensions. This transformed image is subsequently fed into a ResNet50 deep learning classifier. ResNet50 was selected for its proven ability to learn complex, multi-scale features and was initialized with ImageNet pre-trained weights, then fine-tuned for a binary classification task (real vs. fake). The dataset comprised 2,500 real human faces from the FFHQ dataset, 2,500 real cat images from Kaggle, and 2,500 synthetic human faces and 2,500 synthetic cat images generated by StyleGAN2. This balanced dataset of 10,000 images was divided into training (70%), validation (15%), and testing (15%) sets, with various data augmentation techniques applied during the training phase.
The experimental results demonstrated that the DFT-based ResNet50 model significantly outperformed its counterpart trained on raw spatial-domain images. The frequency-domain model achieved an accuracy of 92.82% and an Area Under the Curve (AUC) of 0.95 on the held-out test set, indicating a high ability to correctly identify real versus fake images. In contrast, the spatial-domain model yielded an accuracy of 81.5% and an AUC of 0.85, showing a notably higher error rate. The average precision (AP) also reflected this performance gap, with 0.95 for the DFT model and 0.85 for the spatial model, suggesting better confidence ranking for the frequency-based approach. Furthermore, the DFT model exhibited a lower final cross-entropy loss (approximately 0.20 compared to the spatial model's approximately 0.33), implying more conclusive predictions. Confusion matrices revealed that the DFT-based model achieved higher true positive and true negative rates with fewer misclassifications, underscoring the stronger signal provided by frequency-domain analysis.
The findings from this research confirm that frequency-domain analysis offers a highly discriminative representation for detecting GAN-generated images, proving that even the most realistic synthetic images contain subtle, hidden artifacts detectable in the Fourier domain. This work has two main implications: practically, frequency-domain features present a promising avenue for enhancing deepfake detection, particularly for images, with potential applications in social media image verification and the detection of counterfeit digital content. More broadly, it emphasizes the importance of continuously studying the unique "fingerprints" embedded by generative models to develop more generalized and robust detection methods as these technologies advance. Future research directions include testing cross-model generalization, exploring other frequency representations like DWT and DCT, designing hybrid models that combine spatial and frequency domains, and improving robustness against common post-processing and adversarial attacks to ensure forensic detection methods remain adaptable and effective.