Articles tagged with: Scheduling

Showing 1 results for this tag.

Advanced·Dec 2, 2025

AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving

AugServe is an efficient inference framework that addresses the challenges of serving augmented large language models by introducing a two-stage adaptive request scheduling strategy and a dynamic token-level batching mechanism. It significantly reduces queuing latency and enhances effective throughput, leading to improved user experience in web applications.

Inference Serving

Scheduling

Large Language Models

Research Guy

All Tags

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

All Tags

Research Guy

Research Guy

Articles tagged with: Scheduling