AI Summary • Published on Dec 7, 2025
Copy-move image forgery, which involves duplicating and pasting a region within the same image, poses a growing challenge for detection due to increasingly sophisticated manual manipulations and the emergence of deep generative networks (like GANs) for creating such forgeries. Existing deep learning-based detection methods suffer from two primary limitations: their convolutional neural networks (CNNs) lack inherent robustness to common transformations like rotation and scaling, leading to unreliable similarity measurements between copied and pasted regions. Additionally, current decoding mechanisms, often relying on 1-D similarity vectors, fail to adequately utilize spatial information, thereby limiting the accuracy of tampered region localization. Furthermore, a significant gap exists in benchmarks specifically designed for evaluating detection methods against deep-synthesized copy-move forgeries.
The proposed Multi-directional Similarity Network (MSN) is a two-stream model designed to overcome the limitations in representation and localization for copy-move forgery detection. For enhanced representation, MSN employs a multi-directional CNN architecture that hierarchically encodes images, augmented by various scales and rotations. This augmentation allows the network to better measure similarities between sampled patches even under transformations. For improved localization, MSN introduces a novel 2-D similarity matrix-based decoder. Unlike previous 1-D similarity vector approaches, this decoder fully leverages spatial information across the entire image by transforming similarity measurements into 32x32 similarity maps, which are then classified to precisely locate tampered regions. The paper also introduces a new Deep-synthesized Copy-Move Forgery (DCF) database, comprising 21,000 images generated by VAE, style transfer, and GAN-Rewriting, serving as a benchmark for evaluating detection against deep synthetic forgeries.
Extensive experiments were conducted on the CASIA CMFD, CoMoFoD, and the newly introduced DCF datasets. MSN consistently achieved state-of-the-art results, outperforming existing traditional and deep learning-based methods across various metrics, including F1-score, precision, and recall. On the CASIA CMFD benchmark, MSN demonstrated superior performance in both pixel-level and image-level F1-scores, attributed to its multi-directional architecture and effective spatial information utilization by the 2-D similarity map classifier. On the CoMoFoD dataset, which features 25 categories of attacks, MSN exhibited stable and robust performance under different levels of manipulation. Crucially, evaluations on the DCF database revealed a significant performance degradation for almost all compared methods on deep synthetic data, with traditional approaches being largely ineffective. In contrast, MSN maintained a more favorable performance on DCF-VAE and DCF-Transfer sets, and particularly on the GAN-Rewriting set, showcasing its effectiveness and generalizability to deep-synthesized forgeries. Ablation studies confirmed that both the input space augmentation and the 2-D similarity map classifier significantly contribute to MSN's enhanced detection capabilities.
The Multi-directional Similarity Network (MSN) significantly advances copy-move forgery detection by providing a robust and accurate solution for both traditional and deep-synthesized manipulations. Its ability to handle complex transformations and leverage spatial context for precise localization marks a notable improvement over prior methods. The introduction of the DCF database is a critical contribution, filling a crucial gap in benchmarks for evaluating deep forgery detection and highlighting the vulnerability of current techniques to such advanced tampering. This research underscores the necessity for future studies to focus on texture semantic similarity for deep-synthesized copy-move detection and provides a strong baseline and direction for developing more resilient image forensics tools in the era of deep learning.