AI Summary • Published on Apr 16, 2026
Traditional methods for benchmarking high-performance computing (HPC) applications, exemplified by the beNNch framework, suffer from significant limitations in repeatability, replicability, and reproducibility. These issues stem from the dynamic nature of HPC environments, individualized hardware and software configurations, and communication challenges among researchers. The need for extensive expert knowledge for setup, deployment, and execution creates a high barrier to entry and often leads to inconsistent and unreliable performance assessment. This hinders collaborative research, slows down scientific software development, and makes it difficult to compare results over time or across different systems.
The paper proposes CI-beNNch, an automated continuous benchmarking framework inspired by the principles of continuous integration (CI) and continuous delivery (CD). The core of the method involves abstracting configuration and deployment details through a unified, machine-agnostic entry point. It utilizes hierarchical benchmark configurations, allowing for inheritance and overwriting of parameters to adapt to specific needs while maintaining consistency. The framework is structured into three layers of abstraction: the workflow layer (defining the overall goal like benchmarking), the architecture layer (detailing platform and machine-specific execution steps), and the implementation layer (specifying precise commands). A central controller manages the entire process, from constructing benchmarking pipelines based on configurations and templates to monitoring execution and gathering results. This approach facilitates automated environment setup, consistent execution, and centralized storage of configurations and data, ensuring that benchmarks are run identically regardless of the individual researcher or target machine.
The CI-beNNch framework was implemented using GitLab CI/CD pipelines and Jacamar, deployed across several HPC systems at the Jülich Supercomputing Centre (JSC). The results demonstrated a significant reduction in setup complexity and expert knowledge required for researchers, enabling automated execution of benchmarks in two simple steps. The system ensures high repeatability and reproducibility by decoupling benchmark configurations from individual researchers and providing a centralized, consistent environment. CI-beNNch successfully supported the development of the NEST simulator, leading to notable performance improvements. For instance, optimizations to the 5g simulation kernel in NEST 3.8 resulted in a 26% reduction in simulation time for the HPC-Benchmark model, a 60% reduction for the microcircuit model, and a 37% faster completion for the multi-area model. Furthermore, the use of exponential lookup tables for Spike-Timing Dependent Plasticity (STDP) calculations showed an average reduction of about 5% in spike delivery time. The framework also proved instrumental in identifying and resolving complex system-level performance anomalies, such as NUMA-balancing settings causing warm-up phases on specific machines.
The CI-beNNch framework represents a substantial advancement in the systematic and continuous benchmarking of HPC applications. By improving usability, repeatability, reproducibility, and replicability, it lowers the barrier for researchers, particularly those with limited HPC expertise, to rigorously test and evaluate scientific software. The approach fosters a division of labor, allowing experts to focus on specific domains (infrastructure, software, models) while less experienced researchers can still contribute effectively. Centralized storage of configurations and benchmark data, combined with metadata tracking, enhances long-term reliability and interpretability of results. Beyond performance assessment, CI-beNNch simplifies the integration of new features into production code, accelerates development cycles, and supports the transition of mathematical models from local environments to large-scale HPC systems. This ultimately contributes to more reliable scientific findings and a more efficient research software engineering ecosystem.