Articles tagged with: Speech Recognition

Showing 2 results for this tag.

Advanced·Apr 20, 2026

UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction

This paper introduces UAF (Unified Audio Front-end LLM), a novel large language model that unifies critical audio front-end tasks like voice activity detection, speaker recognition, and automatic speech recognition into a single end-to-end generative framework. UAF aims to overcome the limitations of traditional cascaded pipelines and enhance full-duplex speech interaction by jointly modeling semantic content and interaction-level control signals.

Speech Recognition

Full-Duplex Communication

Large Language Models

Advanced·Jan 13, 2026

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

SLAM-LLM is an open-source deep learning framework designed to train customized Multimodal Large Language Models (MLLMs), with a focus on speech, language, audio, and music processing. It provides a modular configuration, detailed training and inference recipes, and high-performance checkpoints for mainstream tasks, aiming to accelerate research in audio-language models.

Multimodal LLM

Audio Processing

Speech Recognition

Research Guy

All Tags

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

All Tags

Research Guy

Research Guy

Articles tagged with: Speech Recognition