AI Summary • Published on Dec 2, 2025
Traditional methods for identifying gravitational wave signals, such as matched filtering and convolutional neural networks (CNNs), face significant challenges when dealing with real-world detector data. Gravitational wave observations are characterized by non-Gaussian and non-stationary noise, often containing transient artifacts (glitches) that can generate spurious detections. Matched filtering, while optimal for stationary Gaussian noise, struggles with these deviations, requiring complex vetoes. CNNs, typically relying on localized convolutional kernels, are susceptible to being misled by strong transient spikes within their receptive fields, making them less robust to such noise. Furthermore, existing deep learning approaches heavily depend on large-scale simulated datasets for training, numbering hundreds of thousands of samples. This reliance imposes substantial computational costs for generating training data and can lead to domain mismatch when applied to actual interferometer data, which has a limited number of confirmed gravitational wave events (approximately 90 LIGO detections).
This study proposes using Large Language Models (LLMs) for gravitational wave identification, leveraging their ability to process discrete patch tokens and form global relationships via attention mechanisms. This inductive bias is well-suited for data where discriminative information lies in global morphology rather than local numerical details, and it inherently suppresses localized transient noise. The dataset was constructed solely from 90 publicly available LIGO gravitational wave events (O1, O2, O3a, O3b runs), generating 1728 signal segments. To address class imbalance, an oversampling strategy was applied to positive samples. The raw strain data underwent standard preprocessing: an eighth-order Butterworth bandpass filter (20–500 Hz) was applied, followed by normalization. Signals were then transformed into 2D time-frequency representations using the Constant-Q Transform (CQT). These time-frequency matrices were discretized by extracting up to 64 frame-wise feature vectors and clustering them with KMeans to form integer sequences, which were then concatenated from the three detectors (LIGO Hanford, LIGO Livingston, and Virgo when applicable). The Meta-Llama-3-8B-Instruct model, an 8-billion parameter transformer-based LLM, was finetuned on this prepared dataset. The finetuning employed Low-Rank Adaptation (LoRA) for efficient adaptation, using AdamW optimizer, binary cross-entropy loss, and a cosine annealing learning rate scheduler. A binary classification head was added to the model for the identification task.
The finetuned Large Language Model achieved a high identification accuracy using a remarkably small dataset. When trained on only 90 observational LIGO events without any simulated data, the model reached a 97.4% recall for both gravitational wave signals and noise-only segments on the held-out test set, demonstrating stable performance and a low false-alarm rate of approximately 2.6% per class. Training converged rapidly within just two epochs, showing no signs of overfitting despite the limited sample size. A key finding was that, unlike traditional neural networks, the LLM did not benefit from pre-finetuning on a large simulated dataset (560,000 samples from the G2Net challenge). The accuracy on real observational data remained largely unchanged, suggesting LLMs can extract discriminative patterns directly from real data. Scaling studies indicated that increasing model parameter size (from 0.5 billion to 8 billion across Qwen2.5, LLaMA3, and DeepSeek families) consistently improved accuracy, with performance converging around the 8 billion parameter mark. Similarly, increasing the size of the training dataset (from 300 to 60,000 simulated samples) also enhanced accuracy, particularly in the low-data regime, though with diminishing returns at larger scales.
The successful application of Large Language Models (LLMs) to gravitational wave identification highlights their potential as a powerful alternative to traditional neural networks, especially in scenarios characterized by global discriminative patterns, non-Gaussian and non-stationary noise, and limited labeled data. This methodology is not unique to gravitational wave astronomy and holds significant promise for other astronomical domains with similar data characteristics. Examples include radio pulsar and fast radio burst searches, dynamic spectra analysis from low-frequency interferometers, X-ray timing studies of compact objects, and high-energy transient monitoring. In these fields, astrophysical signals often manifest as coherent, extended patterns, while interference and noise can be transient, non-stationary, and instrument-specific. The global attention mechanisms of LLMs could effectively distinguish these coherent astrophysical structures from localized noise, offering a robust and efficient approach to data analysis and identification where real-world labeled examples are scarce. For instance, in radio astronomy, LLMs could process dynamic spectra to identify dispersed sweeps or repeating pulse trains while suppressing localized radio frequency interference (RFI).