Research Guy

Problem

The increasing autonomy and widespread integration of AI agents across various domains introduce significant security risks. Unlike traditional software, AI agents can dynamically select and chain tools, making them vulnerable to misuse if their decision-making process is compromised. Existing security solutions primarily focus on static, text-level guardrails, filtering harmful inputs and outputs. However, these approaches lack the ability to address the dynamic execution of AI agents, failing to control tool application or analyze complex, multi-step agent decision-making and execution lifecycles. This gap necessitates the development of more advanced security mechanisms that can govern AI agent behavior at a deeper, execution-flow level, preventing malicious exploitation of legitimate tool capabilities and mitigating errors arising from AI hallucinations.

Method

AgentGuardian is an end-to-end security framework designed to govern AI agent behavior through context-aware access control policies. It operates in three main stages: a Monitoring Tool, a Policy Generator, and a Policy Enforcer. During a controlled staging phase, the Monitoring Tool collects detailed execution traces of the AI agent, including tool invocations, inputs, and outputs, focusing on LLM-related activities. The Policy Generator then transforms these benign traces into formal access control policies. This involves constructing a Control Flow Graph (CFG) that represents all legitimate execution paths and sequences of tool calls. Simultaneously, it infers fine-grained input policies by grouping semantically related inputs and attributes. This generalization process uses embeddings to combine textual inputs and numerical attributes into a unified feature space, clusters them, and then derives compact policy rules, including regex patterns for textual inputs and value ranges for numeric attributes. The framework ensures comprehensive access control at the tool level, combining input validation, attribute-based validation, and workflow constraints via CFGs. The Policy Enforcer continuously applies these learned policies in real-time during agent operation. Integrated lightweight into existing AI agent architectures, it validates each tool invocation against defined policies, allowing legitimate actions while alerting or terminating execution upon policy violations, even communicating with the orchestrating LLM to halt malicious activity.

Results

AgentGuardian was evaluated across two real-world multi-agent applications: a Knowledge Assistant and an IT Support application, using a test set of 80 benign and 20 policy-violation scenarios. The framework demonstrated high effectiveness in detecting malicious or misleading inputs, achieving an overall False Acceptance Rate (FAR) of 0.10, indicating it successfully detected 18 out of 20 policy violations. The False Rejection Rate (FRR) was low, showing minimal disruption to legitimate agent behavior. Some false rejections were attributed to unusually long inputs or excessive processing durations, suggesting stricter policy constraints in such cases. Importantly, no false rejections were caused by incorrect input processing. The evaluation also introduced the Benign Execution Failures Rate (BEFR), which quantified hallucination-induced errors. AgentGuardian recorded a total BEFR of 7.5%, with these failures originating from the underlying LLM producing non-existent file names or irrelevant tool suggestions. In all these cases, AgentGuardian correctly identified and flagged these anomalous behaviors as policy violations. Furthermore, experiments on policy quality revealed that increasing the number of benign samples used in the staging phase led to increasingly restrictive and specific policies, with regex patterns becoming more constrained and aligned with legitimate input patterns.

Implications

AgentGuardian offers a significant advancement in securing AI agents by providing a robust, context-aware access control framework that learns and adapts to agent behavior. Its ability to enforce policies at both the tool and workflow levels, coupled with automated policy generation, addresses a critical gap in existing AI agent security solutions. By mitigating risks from malicious inputs and reducing hallucination-driven errors, AgentGuardian enhances system integrity and prevents misuse of AI agent capabilities. The framework's lightweight integration ensures it can be deployed with minimal development effort, making it practical for diverse AI agent applications. While automatic policy generation still faces challenges in exhaustively covering all benign variations and balancing restrictiveness with utility, AgentGuardian demonstrates that high-quality, learned policies can serve as an effective governance layer, stabilizing and regulating AI agent behavior beyond just security enforcement.

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Problem

Method

Results

Implications