Articles tagged with: AI Evaluation

Showing 2 results for this tag.

Advanced·Mar 18, 2026

From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-Making

This paper introduces a novel measurement framework to evaluate human-AI decision-making, shifting focus from mere model accuracy to the readiness of human-AI teams for safe and effective collaboration. It proposes a taxonomy of metrics and connects them to the Understand–Control–Improve lifecycle to assess calibration, error recovery, and governance in real-world deployments.

AI Safety

Human-AI Interaction

AI Evaluation

Intermediate·Dec 23, 2025

Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks

This paper challenges the common interpretation of AI models' performance on abstract reasoning benchmarks like ARC, hypothesizing that visual perception limitations, not reasoning deficiencies, are the primary bottleneck. It introduces a two-stage pipeline to separate perception and reasoning, revealing that most model failures stem from perception errors and demonstrating significant performance improvements.

AI Evaluation

Computer Vision

Machine Learning

Research Guy

All Tags

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

All Tags

Research Guy

Research Guy

Articles tagged with: AI Evaluation