AI Summary • Published on Dec 3, 2025
The classification of different types of jets, particularly identifying top-quark jets amidst other light quark and gluon jets, is a critical task in high-energy physics, especially at the Large Hadron Collider (LHC). While state-of-the-art models like Transformers and Graph Neural Networks (GNNs) achieve high accuracy, they come with a significant computational cost. This paper aims to develop a computationally lightweight and scalable solution for jet tagging that can offer competitive performance without such demanding resources.
The authors developed a lightweight and scalable convolutional neural network (CNN) for jet tagging, based on a modified EfficientNet architecture, referred to as EfficientNet-Small (EffNet-S). This approach combined image-based representations of jets with their global kinematic features to improve efficiency and reduce computational cost compared to current complex models. The dataset used consisted of one million top-quark jets and one million background light-quark/gluon jets, generated at 14 TeV center-of-mass energy using Pythia8 and simulated with Delphes. Jets were constructed using the anti-kT algorithm with a radius parameter of 0.8.
Jet constituents were processed into 3-channel images (transverse momentum, mass, energy) with resolutions ranging from 28x28 to 64x64 pixels. Image preprocessing involved centering the hardest constituent, binning other constituents in pseudorapidity and azimuthal angle differences, standardizing pixel values, and applying random flips for augmentation. Additionally, a comprehensive set of global jet features, including jet four-momentum, number of constituents, N-subjettiness variables, and various energy correlation functions (C, D, U, M, N, L series), were computed. These global features were standardized and concatenated with the flattened output of the CNNs.
The EffNet-S architectures were derived by scaling down the original EfficientNet models using negative scaling parameters (ϕ) to adapt them for lower-resolution jet images, while maintaining the compound scaling rules. LeNet models of varying input resolutions were used as a baseline for comparison. Training was conducted on a single PC using a data piping strategy for large datasets, employing the ADAM optimizer with default settings and early stopping based on validation loss. Models were also retrained multiple times to quantify uncertainties from random weight initialization, and inference speeds were evaluated.
The study demonstrated that incorporating global jet features significantly enhanced the discriminative power of the CNNs, particularly in background rejection, with a more pronounced effect for higher image resolutions in LeNet architectures. LeNet models showed marginal performance gains with increasing input image sizes when considering image-only inputs.
The most compact EfficientNet-Small (EffNet-S) model, when processing images alone, substantially outperformed all tested LeNet configurations while being eight times smaller in terms of total parameters than the largest LeNet. Its optimal performance was observed with 32x32 images cropped to 28x28. The addition of global features also benefited EffNet-S models, though the improvement was less dramatic for larger EffNet-S networks.
When utilizing global features, all developed networks surpassed the classic DeepTop CNN structure in accuracy, and some even improved background rejection. Although the ResNeXt-50 model exhibited superior performance, the best-performing EffNet-S variant with global features was seven times smaller in size and achieved inference times half that of ResNeXt-50, highlighting its computational efficiency.
This research highlights several key implications for future jet tagging studies. Firstly, there is a clear need for systematic architectural searches to specifically fine-tune scalable models like EfficientNet for jet classification tasks, potentially yielding even more optimized and efficient solutions. Secondly, the study confirms that augmenting constituent-level image data with global jet features significantly enhances the performance of smaller deep learning models, making them more competitive. Lastly, the ability to represent jets in various data types suggests a promising avenue for developing hybrid ensemble models. Such "mixture of experts" approaches, combining different network architectures to process diverse jet representations (images, four-momenta, global features), hold the potential for achieving superior accuracy and improved signal-to-background discrimination in high-energy physics experiments.