AI Summary • Published on Mar 29, 2026
Backdoor attacks pose a significant threat to Federated Learning (FL) models by injecting poisoned data with hidden triggers, leading to manipulated model behavior. Existing defenses primarily focus on in-training or post-training phases, leaving FL systems vulnerable to early-stage contamination from malicious clients. The decentralized nature of FL makes it challenging for a central server to detect these hidden vulnerabilities, especially in critical applications where manipulated outputs can have severe consequences.
The proposed FL-PBM (Pre-Training Backdoor Mitigation for Federated Learning) is a proactive, client-side defense mechanism. It operates in three main stages before local model training commences. First, a benign trigger is temporarily inserted into data to establish a baseline for identifying anomalies. Second, Principal Component Analysis (PCA) is applied to extract discriminative features, followed by Gaussian Mixture Model (GMM) clustering to identify potentially malicious data samples based on their distribution in the PCA-transformed space. Clusters with distinct characteristics are flagged. Third, samples deemed highly suspicious are entirely excluded from training, while moderately suspicious samples undergo a targeted adaptive blurring technique. This blurring disrupts potential backdoor triggers by smoothing fine-grained pixel details where triggers are likely embedded, all while preserving the image's overall semantic content. These steps are designed to detect and sanitize suspicious data early, minimizing the influence of backdoor triggers on the global model.
Experimental evaluations were conducted on image-based datasets (GTSRB-10 and BTSC-10) under both IID and non-IID scenarios, against one-to-one and N-to-one backdoor attacks. FL-PBM consistently demonstrated superior performance compared to baseline FedAvg and state-of-the-art defenses like LPSF and RDFL. FL-PBM reduced attack success rates (ASR) to 0%–5% in most experiments, a reduction of up to 95% compared to FedAvg, and 30-80% relative to RDFL and LPSF. Simultaneously, it maintained a high clean accuracy rate (CAR), typically between 87% and 97%, even in challenging non-IID settings with high data skew and significant malicious client participation (30%). In contrast, FedAvg remained highly vulnerable with ASR near 90-100%, while LPSF and RDFL showed limitations in fully suppressing backdoors or maintaining high accuracy, especially in non-IID environments.
The findings highlight the critical importance and effectiveness of preemptive, client-side data filtering as a powerful defense mechanism in federated learning security. By addressing backdoor threats at the earliest stage, before data can influence local model training and aggregation, FL-PBM significantly enhances the integrity and trustworthiness of collaborative learning systems without degrading model utility. This approach paves the way for more secure and reliable AI deployments in privacy-sensitive and distributed environments, offering a robust solution to a persistent threat in machine learning. Future work includes integrating GANs for stress-testing and leveraging advanced feature extraction methods like autoencoders.