All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 1 results for this tag.
AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
This paper introduces AgentDS, a new benchmark and competition designed to evaluate the performance of AI agents and human-AI collaboration in domain-specific data science tasks across six diverse industries. The findings indicate that while current AI agents struggle with domain-specific reasoning, the most effective solutions emerge from human-AI collaboration, highlighting the enduring value of human expertise.