All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 6 results for this tag.
AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
This paper introduces AgentDS, a new benchmark and competition designed to evaluate the performance of AI agents and human-AI collaboration in domain-specific data science tasks across six diverse industries. The findings indicate that while current AI agents struggle with domain-specific reasoning, the most effective solutions emerge from human-AI collaboration, highlighting the enduring value of human expertise.
From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents
This paper introduces the Hierarchical Autonomy Evolution (HAE) framework, a novel approach to categorizing security vulnerabilities in AI agents as they evolve from cognitive entities to collective societies. It details a taxonomy of threats across three levels of autonomy, highlighting critical research gaps and guiding the development of robust, multilayered defense architectures for trustworthy AI agent systems.
AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior
AgentGuardian is a novel security framework that enhances AI agent safety by automatically learning context-aware access control policies from benign execution traces. It enforces these policies at the tool level and validates execution flow integrity, effectively detecting malicious inputs and mitigating hallucination-driven errors.
Towards AI Agents Supported Research Problem Formulation
This vision paper explores how artificial intelligence (AI) agents can support Software Engineering (SE) researchers in formulating research problems, aiming to bridge the gap between academic contributions and industrial needs. It proposes integrating AI agents into the Lean Research Inception (LRI) framework to enhance problem definition and assessment.
Strategic Self-Improvement for Competitive Agents in AI Labour Markets
This paper introduces a novel framework to understand strategic behavior and market impact of AI agents in labor markets, incorporating real-world economic forces such as adverse selection, moral hazard, and reputation dynamics. Through simulations, it demonstrates how LLM agents with enhanced reasoning capabilities can strategically self-improve, adapt to market changes, and reproduce classic macroeconomic phenomena while also revealing potential AI-driven economic trends.
Decoding the Configuration of AI Coding Agents: Insights from Claude Code Projects
This paper empirically studies 328 configuration files from public Claude Code projects to understand how developers configure AI coding agents. The findings highlight the importance of defining various software engineering concerns, particularly architectural specifications, within these configuration files to guide agent behavior and improve effectiveness.