Research Guy

Problem

Network Traffic Classification (NTC) is essential for network management and security but faces significant hurdles, including limited availability of labeled data, imbalanced datasets, and stringent privacy regulations. Traditional data-driven solutions often struggle with these practical constraints, hindering their scalability and robustness in real-world scenarios. While Network Traffic Generation (NTG) can help address data scarcity by synthesizing realistic traffic, conventional generative methods are often inadequate. They either fail to capture the complex temporal dynamics of modern network traffic or incur prohibitive computational costs, making them impractical for deployment in contemporary networks.

Method

This work proposes a novel lightweight Generative Artificial Intelligence (GenAI) pipeline for Network Traffic Generation (NTG), utilizing Transformer-based, State Space Models (SSMs), and Diffusion Models (DMs). The approach focuses on generating compact traffic representations from header fields (specifically Payload Length and Packet Direction) of the first 10 packets in each network flow, rather than resource-intensive raw payload bytes. This design allows for training and operating advanced GenAI models with a limited parameter budget (1-2 million parameters), ensuring computational efficiency.

The pipeline operates in two main phases: a training phase and a generation phase. In the training phase, real network traffic traces are pre-processed through segmentation into bidirectional flows (biflows) and feature extraction to create an efficient traffic matrix. Depending on the GenAI model, this matrix is then mapped to either a 2D image (for DMs) or a sequence of tokens (for Transformers and SSMs) to learn the underlying traffic distribution. In the generation phase, trained GenAI models synthesize new traffic samples, conditioned by a class token to dictate the target network application or service. These generated outputs are then inverse-mapped back into the original traffic matrix format.

The effectiveness of this lightweight GenAI paradigm was systematically evaluated along four dimensions: fidelity of synthetic traffic, utility for privacy-preserving NTC via synthetic-only training, effectiveness in data augmentation under low-data regimes, and computational efficiency. Experiments were conducted on two public datasets, Mirage-2019 (40 mobile apps) and CESNET-TLS22-80 (80 network services), comparing LLaMA (Transformer), Mamba (SSM), NetDiffus-NR (Diffusion Model), and a Conditional Variational Autoencoder (CVAE) baseline. Downstream NTC performance was assessed using F1-score. Additionally, the study explored post-training quantization for LLaMA to further optimize resource utilization.

Results

The evaluation revealed that lightweight GenAI models, particularly LLaMA (Transformer) and Mamba (SSM), achieved high fidelity in reproducing real traffic patterns. They demonstrated near-zero Jensen-Shannon Divergence (JSD) scores across static and temporal traffic characteristics, including packet count, 1-gram, 2-gram, and Markov transition matrices. Both models also exhibited near-optimal UniqAlign and Leakage scores, indicating they generated diverse and realistic samples without merely memorizing training data. In contrast, NetDiffus-NR (Diffusion Model) consistently showed the lowest fidelity.

For Network Traffic Classification (NTC) tasks, classifiers trained exclusively on synthetic data generated by LLaMA and Mamba achieved F1-scores of up to 87.43% on real data, significantly reducing the performance gap with classifiers trained on full real datasets. In low-data regimes, GenAI-driven augmentation substantially improved NTC performance, boosting F1-scores by up to +40% on Mirage-2019 and +10-15% on CESNET-TLS22-80, outperforming traditional statistical (SMOTE) and expert-driven (Fast Retransmit) augmentation methods.

Regarding computational efficiency, LLaMA offered the most favorable trade-off, with competitive training times (36.8 seconds/epoch), low generation latency (31.21 milliseconds/sample), and a modest on-disk footprint (7.9 MB). Post-training quantization for LLaMA further reduced its model size to 3.5 MB, maintaining generation fidelity and classification performance without meaningful degradation. While Mamba showed strong fidelity, its generation latency and on-disk footprint were higher. NetDiffus-NR incurred the highest generation latency, making it less practical for real-time deployment.

Implications

This research demonstrates that lightweight Generative AI architectures, especially transformer-based models like LLaMA and state-space models like Mamba, are highly effective for synthesizing high-fidelity and privacy-preserving network traffic. Their ability to accurately mimic real traffic characteristics and significantly improve Network Traffic Classification performance, even with limited real data or synthetic-only training, positions them as crucial tools for addressing current challenges in network management and security. The optimal balance of fidelity and computational efficiency, particularly offered by LLaMA with post-training quantization, makes these models suitable for practical deployment in resource-constrained environments. This work opens avenues for future research into adaptive generation strategies, hybrid generative pipelines, and the integration of quantized models into edge-based intrusion detection systems for real-time, privacy-aware traffic analysis and autonomous network defense mechanisms.

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification

Problem

Method

Results

Implications