All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 2 results for this tag.
OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
This paper introduces OD-MoE, a distributed Mixture-of-Experts (MoE) inference framework designed for memory-constrained edge devices. It enables fully on-demand expert loading without a cache, achieving high decoding speeds and significantly reducing GPU memory requirements while maintaining full model precision.
DeepSeek-V3 Technical Report
DeepSeek-V3 is a powerful 671B Mixture-of-Experts language model that demonstrates state-of-the-art performance among open-source models and competes with leading closed-source models, achieved through efficient architectures and novel training strategies while maintaining remarkably low training costs.