All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 3 results for this tag.
Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems
This paper introduces a practical framework for benchmarking and optimizing AI models on ARM Cortex processors in embedded systems. It focuses on balancing energy efficiency, accuracy, and resource utilization, demonstrating how optimal processor and model selections depend on an application's inference cycle time.
Medical Imaging AI Competitions Lack Fairness
This paper systematically investigates fairness in medical imaging AI benchmarking competitions, revealing significant biases in dataset composition and critical flaws in data accessibility, licensing, and documentation. The findings highlight a disconnect between leaderboard success and clinically meaningful AI, urging for improved transparency and reusability standards.
Evaluating Long-Context Reasoning in LLM-Based WebAgents
This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. It observes a dramatic performance degradation as context length increases and proposes an implicit RAG approach for modest improvements.