AI Summary • Published on Dec 20, 2025
Decoding brain activity from electroencephalography (EEG) signals presents significant challenges, primarily due to the inherent noise, spatial diffusion, and temporal variability of these signals. This limits the interpretability of neural representations, particularly for reconstructing visual stimuli. Furthermore, the scarcity of paired EEG-image data restricts the effectiveness of traditional generative models in this domain.
The proposed Brain-Gen framework utilizes a two-stage approach. First, a Spatio-Temporal encoder, built with multi-layered, multi-head self-attention transformer blocks, is designed to capture both long-range temporal dependencies and spatial relationships among EEG channels. This encoder is trained using a contrastive objective, specifically a margin-based triplet loss with semi-hard negative mining, to learn discriminative features. EEG signals are processed into tokenized segments using a sliding window. Second, these extracted EEG feature sequences are then used to condition a pre-trained Stable Diffusion-2 model. The EEG latents replace the textual prompt embeddings in the diffusion process via a cross-attention mechanism, guiding the image reconstruction. During this stage, the weights of the EEG encoder are frozen, and only the denoising UNet of the diffusion model is fine-tuned to learn the semantics from the EEG latent space.
Evaluations on the EEG-CVPR40 and ThoughtViz datasets demonstrated the efficacy of the Brain-Gen method. It achieved a 6.51% increase in k-means clustering accuracy on ThoughtViz and a 3.53% increase on EEG-CVPR40 compared to existing baselines. For zero-shot generalization across unseen classes on EEG-CVPR40, the framework showed a notable 11.84% improvement in k-means clustering accuracy and a 15.11% gain in KNN accuracy. In terms of image reconstruction quality on EEG-CVPR40, the method yielded an Inception Score of 25.15 and a Fréchet Inception Distance of 81.07, performing comparably to other state-of-the-art baselines. The study also empirically observed that increasing the complexity of the EEG encoder, by incorporating both spatial and temporal modules, leads to higher quality image generation.
This research marks a significant advancement toward the generalizable semantic interpretation of EEG signals and enhances the capability for visual stimulus reconstruction from brain activity. By effectively extracting meaningful representations from noisy EEG data, the Brain-Gen framework has the potential to improve the performance and applicability of brain-computer interfaces. This work contributes to a deeper understanding of cognitive processes, particularly in the realm of generating visual content directly from thoughts, and its robustness with reduced signal duration suggests practical utility in real-world BCI applications.