All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 3 results for this tag.
LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence
The paper introduces LexGenius, a comprehensive, expert-level Chinese legal benchmark designed to systematically evaluate the legal general intelligence of Large Language Models (LLMs). It utilizes a multi-dimensional framework and a large dataset of carefully curated legal questions to reveal significant gaps between LLMs and human legal professionals, particularly in areas requiring soft legal intelligence and nuanced judgment.
Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol
This paper introduces a new benchmark for evaluating Large Language Model (LLM) agents in planning and execution tasks within industrial automation. It uses the Blocksworld problem with five complexity categories and integrates the Model Context Protocol (MCP) as a standardized tool interface, enabling systematic comparison of diverse LLM agent architectures.
"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding
This paper introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding (SLU), evaluating eight unlearning techniques across various datasets and models. It also proposes the Global Unlearning Metric (GUM) to comprehensively assess efficacy, utility, and efficiency in unlearning requests, particularly concerning speaker-specific data for privacy.