All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 1 results for this tag.
Pervasive Annotation Errors Break Text-to-SQL Benchmarks and Leaderboards
This paper exposes widespread annotation errors in leading text-to-SQL benchmarks, BIRD and Spider 2.0-Snow, and demonstrates how these inaccuracies severely distort model performance evaluations and leaderboard rankings. It also introduces SAR-Agent and SAPAR, an AI-powered toolkit designed to effectively detect and correct these pervasive errors, advocating for higher quality benchmark development.