Research Guy

Problem

Real-world e-commerce recommender systems face significant challenges, including the need to deliver relevant items under strict tens-of-milliseconds latency constraints, effectively recommend cold-start products, capture rapidly shifting user intent, and adapt to dynamic contexts like seasonality or promotions. Traditional collaborative filtering and feature-driven approaches often underutilize rich content and struggle to quickly reflect fast-changing user interests or external factors. Furthermore, deploying state-of-the-art sequential models in production at a large scale introduces complexities related to computational expense and maintaining fresh representations.

Method

STARS (Semantic Tokens with Augmented Representations for Recommendation at Scale) is a Transformer-based sequential recommendation framework that integrates several key innovations. It employs dual-memory user embeddings to disentangle long-term user preferences from short-term session intent. For item representation, STARS utilizes semantic item tokens that combine frozen pre-trained text embeddings, learnable delta vectors, and LLM-derived attribute tags, which significantly enhances content-based matching, long-tail item coverage, and cold-start performance. The framework also incorporates context-aware scoring with jointly learned calendar and event offsets, enabling dynamic adaptation to current contextual factors. For efficient and low-latency deployment, STARS adopts a two-stage retrieval pipeline that performs offline embedding generation and online maximum inner-product search with filtering. Training uses a candidate-slice softmax loss with subclass-aware negative sampling, which helps the model make finer distinctions between similar items.

Results

In comprehensive offline evaluations on production-scale e-commerce data, STARS delivered substantial improvements over the existing LambdaMART system. It achieved more than a 75% relative increase in Hit@5 (from 0.395 to approximately 0.691–0.693). An extensive online A/B test conducted across 6 million visits on a large e-commerce platform demonstrated statistically significant lifts in user engagement metrics, including a +0.8% increase in Total Orders, a +2.0% increase in Add-to-Cart actions on the Home page, and a +0.5% increase in Visits per User. The performance gains were most pronounced in scenarios with larger candidate sets, showing up to a +265% relative lift in Hit@5. Ablation studies confirmed that LLM semantic features and the dual-memory user embedding structure were crucial contributors to these improvements.

Implications

The successful deployment and evaluation of STARS highlight that combining deep learning techniques, semantic enrichment from Large Language Models, multi-intent user modeling, and a carefully designed, latency-conscious system architecture can yield significant advancements in real-world recommendation quality. STARS effectively addresses prevalent challenges such as cold-start items, the dynamic nature of user intent, and the need for scalable, efficient serving. This work demonstrates a practical pathway to achieve state-of-the-art recommendation performance in demanding e-commerce environments without compromising on speed or scalability, paving the way for more intelligent and responsive personalized discovery experiences.

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale

Problem

Method

Results

Implications