AI Summary • Published on Feb 24, 2026
Current world models, primarily based on unstructured neural networks, face significant challenges including poor sample efficiency, limited generalization to unseen states, accumulating rollout errors over long horizons, and a lack of explicit geometric meaning in their latent spaces. These limitations restrict their applicability, particularly in scenarios with limited training data or when requiring interpretable decision-making beyond direct trial-and-error. Unlike biological systems that exploit environmental symmetries, these models often fail to incorporate such geometric priors, hindering their ability to effectively capture underlying world dynamics and scale efficiently.
The authors propose a generalizable world model that integrates Vector Symbolic Architecture (VSA) principles, specifically Fourier Holographic Reduced Representation (FHRR), as geometric priors. The approach encodes states and actions into high-dimensional complex unitary vectors using learnable FHRR encoders. Environmental transitions are modeled using element-wise complex multiplication in this latent space. The framework is formalized with a group-theoretic foundation, where action compositions generate an action group. The model is trained to ensure latent group structure on actions, multi-step action composition, and invertibility, alongside a robust cleanup mechanism. The objective function minimizes a binding loss for transition equivariance and incorporates invertibility and orthogonality regularizers to preserve structure and ensure distinct state representations are quasi-orthogonal. Cleanup is performed via a nearest-neighbor search in a state codebook, leveraging the high-dimensional separation of VSA representations to correct noisy predictions.
Experiments conducted on a 10x10 discrete grid world environment demonstrate the proposed model's superior performance compared to MLP baselines. The VSA-based model achieved 87.5% zero-shot accuracy on unseen state-action pairs, significantly outperforming MLPs. For long-horizon rollouts over 20 timesteps, the model showed 53.6% higher accuracy, maintaining stability unlike MLPs which accumulated drift. It also exhibited 4 times higher robustness to noise, maintaining over 80% accuracy even under large amounts of Gaussian noise, whereas MLP performance struggled. Visualization of the latent space confirmed that the FHRR model captured the grid environment's structure, unlike unstructured MLP latents. The cleanup operation improved FHRR model accuracy by 35% in certain zero-shot scenarios, leading to a 3.3x improvement over MLP baselines.
This work highlights that training for latent group structure yields generalizable, data-efficient, and interpretable world models. By incorporating geometric priors through VSA, the model offers a principled pathway towards structured models suitable for real-world planning and reasoning tasks, addressing key limitations of current unstructured approaches. While currently limited to small discrete environments, the findings suggest strong potential for extending this approach to more complex continuous, stochastic, or partially observable domains. Future work aims to integrate VSA-based world modeling into model-based reinforcement learning and planning, enabling generalizable dynamics models for practical applications.