AI Summary • Published on Dec 9, 2025
Real-world e-commerce recommender systems face significant challenges, including the need to deliver relevant items under strict tens-of-milliseconds latency constraints, effectively recommend cold-start products, capture rapidly shifting user intent, and adapt to dynamic contexts like seasonality or promotions. Traditional collaborative filtering and feature-driven approaches often underutilize rich content and struggle to quickly reflect fast-changing user interests or external factors. Furthermore, deploying state-of-the-art sequential models in production at a large scale introduces complexities related to computational expense and maintaining fresh representations.
STARS (Semantic Tokens with Augmented Representations for Recommendation at Scale) is a Transformer-based sequential recommendation framework that integrates several key innovations. It employs dual-memory user embeddings to disentangle long-term user preferences from short-term session intent. For item representation, STARS utilizes semantic item tokens that combine frozen pre-trained text embeddings, learnable delta vectors, and LLM-derived attribute tags, which significantly enhances content-based matching, long-tail item coverage, and cold-start performance. The framework also incorporates context-aware scoring with jointly learned calendar and event offsets, enabling dynamic adaptation to current contextual factors. For efficient and low-latency deployment, STARS adopts a two-stage retrieval pipeline that performs offline embedding generation and online maximum inner-product search with filtering. Training uses a candidate-slice softmax loss with subclass-aware negative sampling, which helps the model make finer distinctions between similar items.
In comprehensive offline evaluations on production-scale e-commerce data, STARS delivered substantial improvements over the existing LambdaMART system. It achieved more than a 75% relative increase in Hit@5 (from 0.395 to approximately 0.691–0.693). An extensive online A/B test conducted across 6 million visits on a large e-commerce platform demonstrated statistically significant lifts in user engagement metrics, including a +0.8% increase in Total Orders, a +2.0% increase in Add-to-Cart actions on the Home page, and a +0.5% increase in Visits per User. The performance gains were most pronounced in scenarios with larger candidate sets, showing up to a +265% relative lift in Hit@5. Ablation studies confirmed that LLM semantic features and the dual-memory user embedding structure were crucial contributors to these improvements.
The successful deployment and evaluation of STARS highlight that combining deep learning techniques, semantic enrichment from Large Language Models, multi-intent user modeling, and a carefully designed, latency-conscious system architecture can yield significant advancements in real-world recommendation quality. STARS effectively addresses prevalent challenges such as cold-start items, the dynamic nature of user intent, and the need for scalable, efficient serving. This work demonstrates a practical pathway to achieve state-of-the-art recommendation performance in demanding e-commerce environments without compromising on speed or scalability, paving the way for more intelligent and responsive personalized discovery experiences.