AI Summary • Published on Dec 3, 2025
Large Language Models (LLMs) have shown broad applicability, including in urban systems and tourist recommendations. However, they frequently suffer from hallucinations and significant limitations in spatial retrieval and reasoning. Traditional route recommendation systems often focus on shortest-path algorithms, failing to capture the multi-dimensional aspects of walking experiences and personalized user preferences. This highlights a critical need for novel approaches that can effectively combine LLMs with accurate, domain-specific, and timely spatial information to generate meaningful and walkable urban itineraries while supporting users in urban discovery.
The authors introduce WalkRAG, a spatial Retrieval-Augmented Generation (RAG) framework with a conversational interface designed to recommend walkable urban itineraries. The framework consists of three primary components:
The Query Understanding and Answer Generation (QUAG) component utilizes an integrated LLM (Llama 3.1 8B) to manage conversational interactions. It classifies user queries to determine whether they involve a new itinerary suggestion based on walkability indicators and user preferences, or requests for general information about points of interest (POIs) along an existing route. Based on the query type, it routes requests to the appropriate spatial or information retrieval component and then augments the LLM's generation capabilities with the retrieved content to produce a final, grounded response.
The Spatial component is responsible for identifying the most walkable route between specified origin and destination points. It integrates various indicators for a high-quality walking experience, such as sidewalk availability, air pollution levels, presence of green areas, and accessibility for individuals with disabilities. It quantifies the overall walkability of each candidate route using a calculated Walkability Score (WS) and can enrich routes with additional POIs based on explicit user preferences or general tourist information. This component leverages APIs like Nominatim for coordinates, GraphHopper for route generation, OpenWeatherMap for air quality, and OSMnx for OpenStreetMap data filtering.
The Information Retrieval (IR) component integrates a neural indexing and search system. It encodes documents from a knowledge base (TREC CAsT 2019 and 2020 collection) into dense vectors offline, storing them in a FAISS vector index. At query time, user queries are similarly encoded, and an approximate nearest-neighbor search algorithm retrieves the top-k most relevant passages. These passages are then returned to the QUAG component to provide a grounded and contextually informed response, reducing hallucinations.
WalkRAG was evaluated using a custom dataset of 10 spatial requests and 30 follow-up information queries focusing on the city of Paris, simulating typical user interactions.
For Query Understanding, the QUAG component successfully classified all 40 queries, routing them correctly to either the spatial or IR component.
In Spatial Requests, WalkRAG generated 4 fully correct routes and 6 partially correct ones, with partial correctness defined by minor omissions in navigation steps. Importantly, WalkRAG did not produce hallucinations. In stark contrast, the LLM-ClosedBook (LLM-CB) baseline failed all 10 spatial queries. It consistently exhibited significant hallucinations, including suggesting directions with large jumps (1.7 km to 8.6 km), looping instructions, poor spatial awareness, and recommending POIs far from the actual route (e.g., 3.9 km away). WalkRAG also accurately identified poorly walkable routes, while LLM-CB often provided incorrect assessments or redirected users to distant public transportation options.
For Information Requests, WalkRAG yielded 20 correct answers, 5 partially correct, and 5 incorrect answers. The partially correct answers often involved factual inaccuracies or lack of specificity, such as listing a demolished building as a notable hotel. The LLM-CB baseline produced 12 correct answers, 11 partially correct, and 7 incorrect answers. These results clearly highlight the benefit of WalkRAG's RAG mechanism in enhancing factual accuracy and reducing errors compared to standalone LLMs.
The findings demonstrate that a spatially-enhanced Retrieval-Augmented Generation framework like WalkRAG significantly improves factual accuracy and completeness in urban discovery and walkable itinerary recommendations. It successfully addresses the limitations of standalone LLMs, which struggle with spatial reasoning, suffer from hallucinations, and lack domain-specific knowledge in urban contexts. While WalkRAG may encounter challenges when retrieval lacks sufficient context, it consistently outperforms closed-book LLMs. These preliminary results underscore the potential of combining LLMs with spatial and contextual knowledge for urban applications and open promising avenues for future research, including assessing different LLM model architectures, enhancing spatial reasoning through richer geographic operations and routing algorithms, and improving how LLMs process structured route data.