Articles tagged with: distributed-systems

Showing 1 results for this tag.

Advanced·Apr 15, 2026

Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving

The paper introduces a formal resource‑allocation framework for serving large transformer‑based models with pipeline parallelism, proposing greedy placement, cache allocation, and load‑balancing algorithms that drastically cut inference latency.

distributed-systems

model-serving

load-balancing

Research Guy

All Tags

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

All Tags

Research Guy

Research Guy

Articles tagged with: distributed-systems