AI Summary • Published on Feb 17, 2026
Reliable global streamflow forecasting is essential for managing water resources and preparing for floods. Traditional physically-based hydrological models face challenges with accurately representing complex processes, high computational demands for global simulations, and limitations imposed by the quality and resolution of climate-weather forcing data. While data-driven machine learning models have shown promise, a significant operational gap exists: models trained on historical reanalysis data (which is more "perfect") often perform poorly when exposed to real-time operational forecast data, which has different error structures and biases. This "reanalysis-to-forecast domain shift" is a critical hurdle for deploying data-driven models reliably in real-world operational flood forecasting systems, regardless of their architectural complexity.
The AIFL (Artificial Intelligence for Floods) model is a deterministic, single-layer Long Short-Term Memory (LSTM) network designed for global daily streamflow forecasting. It operates with a 180-day input window and predicts 10-day output sequences. The model processes dynamic meteorological forcings (surface net solar radiation, surface net thermal radiation, surface pressure, 2-meter air temperature, and total precipitation) and 203 static catchment attributes (physiography, soil, geology, land cover, climatology, and anthropogenic influence) through separate three-layer feedforward embedding networks before integrating them into a 1024-unit LSTM core. The training dataset consists of 18,588 unique basins from the CARAVAN dataset, carefully curated through a deduplication and quality-control procedure based on basin geometry and observed discharge similarity. This ensures a globally consistent and non-redundant training set. A novel two-stage transfer learning strategy is employed to address the reanalysis-to-forecast domain shift. First, the model is pre-trained on 40 years of ERA5-Land reanalysis data (1980–2019) to learn fundamental hydrological processes using a normalized Mean Squared Error (MSE) loss function. Second, all model weights are fine-tuned on operational Integrated Forecasting System (IFS) control forecasts (2016–2019) using a reduced learning rate. This fine-tuning phase allows the model to adapt to the specific error structures and statistical characteristics of real-time operational weather prediction forcings, effectively correcting for forecast-induced biases.
Evaluated on an independent temporal test set from January 2021 to September 2024 across 2,003 gauged basins, AIFL achieved robust predictive skill. The model recorded a median modified Kling–Gupta Efficiency (KGE′) of 0.66 and a median Nash-Sutcliffe Efficiency (NSE) of 0.53, both surpassing established satisfactory performance thresholds. It also demonstrated a high median Pearson correlation (r=0.81) and near-perfect volume conservation with a median bias ratio (β) of 1.00. The two-stage training strategy significantly improved the mean global skill (mean KGE′ increased from 0.21 to 0.44, mean NSE from -11.40 to -3.26), primarily by enhancing performance in previously low-performing basins. For flood event detection, AIFL exhibited highly conservative behavior, achieving a global precision of 1.0 across all return periods (1.5 to 50 years), indicating no false alarms. Recall was 0.54 for frequent events (1.5-year return periods) and decreased to 0.32 for 50-year floods. Benchmarking against the Google global flood model on 1,218 shared stations showed AIFL to be highly competitive, matching or exceeding Google's skill at 42.9% of locations. AIFL performed particularly well in smaller catchments, outperforming Google at 55% of such stations, and maintained stable performance across all basin sizes compared to Google's greater variability in smaller catchments.
The AIFL model establishes a transparent, reproducible, and operationally viable baseline for global streamflow forecasting. Its novel two-stage training strategy successfully addresses the crucial reanalysis-to-forecast domain shift, ensuring robust predictions under real-time operational conditions. The model's exceptional precision in flood event detection, virtually eliminating false alarms, is a critical feature for building user trust and enhancing the effectiveness of operational early-warning systems, as demonstrated during the January 2024 Storm Henk floods in Belgium where a significant flood was detected six days in advance. While recall for rare extreme events could be further improved, the current approach prioritizes high-confidence alerts. Future work will explore incorporating distributional objectives for uncertainty quantification, integrating probabilistic ensemble forcings to refine event detection and support risk-based decision-making, and investigating the use of multi-source precipitation products to improve the detection of rare extremes. AIFL's balance of consistency with competitive skill offers a valuable tool for research, diagnostic evaluation, and practical deployment in the global hydrological community.