All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 22 results for this tag.
LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence
The paper introduces LexGenius, a comprehensive, expert-level Chinese legal benchmark designed to systematically evaluate the legal general intelligence of Large Language Models (LLMs). It utilizes a multi-dimensional framework and a large dataset of carefully curated legal questions to reveal significant gaps between LLMs and human legal professionals, particularly in areas requiring soft legal intelligence and nuanced judgment.
Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning
This paper introduces Semantic Soft Bootstrapping (SSB), a novel self-distillation technique for enhancing long-context reasoning in large language models without relying on reinforcement learning. SSB uses the same base LLM as both teacher and student, leveraging semantically rich contexts to generate robust explanations and distilling logit-level supervision to improve reasoning capabilities efficiently.
Towards an AI Fluid Scientist: LLM-Powered Scientific Discovery in Experimental Fluid Mechanics
This paper introduces an AI Fluid Scientist framework that automates the entire experimental fluid mechanics workflow, from hypothesis generation to manuscript preparation, using a multi-agent LLM system and a computer-controlled water tunnel. It demonstrates the framework's ability to reproduce benchmarks, discover new phenomena, and generate robust scientific findings with minimal human intervention.
Multi-LLM Collaboration for Medication Recommendation
This paper introduces a Multi-LLM Collaboration approach, guided by an "LLM Chemistry" framework, to enhance the reliability and trustworthiness of medication recommendations from clinical vignettes. The method aims to create effective, stable, and calibrated LLM ensembles by explicitly modeling interaction dynamics.
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
This paper introduces Arbitrage, a novel step-level speculative generation framework designed to enhance the efficiency of Large Language Models (LLMs) in reasoning tasks. It dynamically routes between a fast draft model and a more capable target model based on the expected quality advantage, significantly reducing computational waste and inference latency while maintaining accuracy.
Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case
This paper investigates the zero-shot multilingual capabilities of open-source Large Language Models (LLMs) for extracting comorbidity information from Italian Electronic Health Records (EHRs). The study reveals that these LLMs struggle to generalize across various diseases and do not perform as well as traditional pattern matching or human annotations in this specific healthcare use case.
Spatially-Enhanced Retrieval-Augmented Generation for Walkability and Urban Discovery
This paper introduces WalkRAG, a spatial Retrieval-Augmented Generation (RAG) framework that leverages Large Language Models (LLMs) to recommend personalized and walkable urban itineraries. It addresses known LLM limitations in spatial reasoning and factual accuracy by integrating spatial and contextual urban knowledge for enhanced route generation and point-of-interest information retrieval.
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
The paper introduces a comprehensive method and ecosystem (NexAU, NexA4A, NexGAP) to overcome limitations in scaling interactive environments for training agentic Large Language Models (LLMs). This infrastructure enables the systematic generation of diverse, complex, and realistically grounded interaction trajectories for LLMs.
AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving
AugServe is an efficient inference framework that addresses the challenges of serving augmented large language models by introducing a two-stage adaptive request scheduling strategy and a dynamic token-level batching mechanism. It significantly reduces queuing latency and enhances effective throughput, leading to improved user experience in web applications.
Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs
This paper introduces a training-free method for detecting policy violations in Large Language Models by treating it as an out-of-distribution problem in the activation space. The approach utilizes activation-space whitening and the Euclidean norm as a compliance score, outperforming existing guardrails and fine-tuned models while offering high interpretability and efficiency.