AI Summary • Published on Mar 12, 2026
The one-dimensional (1D) flux power spectrum of the Lyman-alpha (Lyα) forest is a crucial probe for understanding small-scale structures in the high-redshift Universe, providing unique insights into dark matter, the intergalactic medium, and fundamental physics. However, interpreting these measurements requires detailed modeling of complex, degenerate phenomena like nonlinear structure formation, gas dynamics, and astrophysical feedback. Traditional inference methods, often relying on simulation grids or emulators, suffer from limitations such as assumptions of Gaussian likelihoods, difficulty in capturing complex degeneracies, and dependence on single simulation codes or models. An efficient and accurate inference framework capable of marginalizing over astrophysical uncertainties is needed as cosmological datasets grow in size and precision.
This study performs the first full simulation-based inference (SBI) on the Lyman-α forest 1D power spectrum (P1D(k)). It utilizes the CAMELS suite of cosmological hydrodynamic simulations, specifically those run with the IllustrisTNG and SIMBA galaxy formation models. These simulations encompass variations in two cosmological parameters (Ωm and σ8) and four astrophysical parameters related to supernova and AGN feedback. The researchers train a normalizing flow, employing a Neural Posterior Estimation (NPE) approach with Masked Autoregressive Flows (MAF), to directly approximate the posterior distributions of these parameters. A key methodological aspect involves combining P1D(k) measurements across eight different redshifts into a single input vector for the neural network. To address a systematic mean flux mismatch between IllustrisTNG and SIMBA simulations, the SIMBA mean flux is rescaled to match IllustrisTNG. Due to the limited volume of the CAMELS simulations and the dominance of cosmic variance, the analysis primarily focuses on constraining only the cosmological parameters, as the astrophysical parameters were found to be unconstrained. Various maximum wavenumber (kmax) scale cuts (1.5, 2.0, and 3.0 h Mpc−1) were tested for the P1D(k) measurements.
When the inference framework was trained and tested self-consistently on a single galaxy formation model (IllustrisTNG or SIMBA), the cosmological parameters (Ωm and σ8) were recovered with excellent accuracy and precision. For IllustrisTNG, parameters were within 10% deviations in over 75% (Ωm) and over 90% (σ8) of cases, achieving approximately 8% and 6% precision, respectively. SIMBA results were slightly less accurate but still unbiased. However, astrophysical parameters remained unconstrained across all configurations due to the dominance of cosmic variance over feedback effects in the small simulation volumes. A significant challenge arose during cross-generalization tests (training on IllustrisTNG and testing on SIMBA), even with mean flux rescaling. While Ωm remained relatively unbiased, its scatter increased, and σ8 exhibited a ~10% positive bias, with posteriors showing overconfidence. This highlighted the inherent cross-generalization problem between different galaxy formation models. To mitigate this, a multi-domain training approach was implemented by combining simulations from both IllustrisTNG and SIMBA. This proved effective, recovering unbiased parameter values for both Ωm and σ8 with accuracy and precision comparable to the self-consistent single-model training, demonstrating its ability to handle model uncertainties. Additionally, exploring inference on the power spectrum of optical depth (τ) instead of flux (F) yielded improved precision for both Ωm (~5.8%) and σ8 (~4.6%) while maintaining unbiased results.
This study represents a significant advancement in using simulation-based inference for cosmological parameter estimation from the Lyman-alpha forest. The success of the multi-domain training method provides a robust solution for dealing with inconsistencies between different galaxy formation models, which is crucial for applying these techniques to real observational data. The work paves the way for future efforts to constrain fundamental physics more accurately with the Lyman-alpha forest. Moving forward, the framework will be extended to next-generation CAMELS simulations with larger volumes to potentially constrain astrophysical parameters and achieve even better statistical precision. Further research will involve applying this framework to other dark matter physics simulations and real-world P1D(k) measurements from instruments like DESI and KODIAQ-SQUAD. This will necessitate addressing observational effects such as instrumental noise and contaminants, potentially through advanced domain adaptation techniques, and further exploring the advantages of optical depth power spectrum inference.